RestComm / gmlc

Restcomm Location Server
http://www.restcomm.com/
GNU Affero General Public License v3.0
17 stars 40 forks source link

MobileCoreNetworkInterfaceSbb vulnerable to race conditions #66

Closed Vanit closed 8 years ago

Vanit commented 8 years ago

I was investigating how MobileCoreNetworkInterfaceSbb functioned and assumed it was supposed to be an instance per request as it's using request contextual local variables. Through testing I was able to confirm that the same instance is servicing all requests and each request is overriding the local variables of the previous request, resulting in a race condition if the first request has not yet finished executing at the time of a subsequent request.

This becomes particularly problematic because the HTTP response object of the first request is lost when the second request comes in. Depending on the circumstances of each request, the first request could reply to the second request, otherwise it will just miss out entirely as its request object was overwritten.

You can confirm it's the same instance through the object Id in the debugger, and also by setting a counter that increments on each request and noting that the value never resets.

I imagine the fix would be something along the lines of keeping a hashmap of HTTP requests and request data, and attaching an associated identifier to the ActivityContextInterface to be used for lookup on the Dialog callbacks.

angrygreenfrogs commented 8 years ago

@Monix Good job finding this one. I do remember when I was first working on the MLP code that one of the others in the group pointed out a problem like this in terms of not being able to rely on local variables (between the point where you make a call out and are awaiting a callback), but I can't remember any solution being suggested.

I'd say we need to first ask what the best practice is, how do we maintain data in this instance? It must be a common problem.

@FerUy Can you or anyone on the team give us some advice?

Separately that's also certainly interesting if you're saying it looks like it's a re-used single threaded handler for all incoming requests? I can't imagine that part is quite true... it must maintain a pool of instances of the Sbb and just re-use whichever happens to be free.. otherwise it wouldn't scale up at all.

FerUy commented 8 years ago

@Monix @angrygreenfrogs, I was investigating the same class for issue #25, but your findings are more crucial to me, definitely. As @vetss being the main developer of such class, I believe we need his insights on this issue which I tagged as a bug.

vetss commented 8 years ago

@Monix do you use the same TCP connections for both HTTP requests (when sending of a next request before you received a response for a new request) ?

Vanit commented 8 years ago

@vetss Good question! I was using Chrome for testing, so there's a possibility it's reusing the TCP connection. However, I ran the test via multiple browsers at the same time and I still found the same result.

Vanit commented 8 years ago

Had some fun learning about SLEE architecture to get to the bottom of this. So I need to correct my original post - most of the race conditions I was worried about aren't actually race conditions (they do have code smell, but I'll get to that)

Particularly the part I wrote about the first request being able to respond to a second interrupting request isn't true as the event context is correctly preserved via a CMP field (I didn't realise what this was doing before).

Most of the class fields are also safe because they're only accessed before or after the MAP request is handled, but not both. The exception to this is the requestingMsisdn and httpRequestType fields, which would cause the GMLC to respond in the format of the most recent request, so MLP requests could respond with REST formatting if a REST call was received before the MLP response could be sent.

This is easy to fix by changing these to CMP fields instead of class fields. Which brings me to the code smell - even though the other class fields are currently safe, they should probably be changed to CMP fields anyway to prevent future gotchas.

I'm working on a pull request now that will resolve all of this.

(Edited for clarity)

vetss commented 8 years ago

@Monix

I agree with your update in https://github.com/RestComm/gmlc/pull/67