Scaling up: Response time too long

inconnu26 commented 6 years ago

Hello,

We've been using the barefoot project for a few month now, and it has been an amazing discovery, and of a great help. But we're stuck with an issue of scaling our project.

We were using a bfmap of France (~1.4G), and now that we want to use the system of multiple countries at a time, we've made a larger bfmap(~4.2GB). But taking over 20GB or RAM and 100% or 16-core processors, it takes over 10 minutes for an average trip (~10km) to be mapmatched. Whereas with the bfmap of France, and with a smaller server, it used to take about 30 seconds to be mapmatched, for a same trip.

So we tried to work on your spark jobs implementation, but it seems that it only allows to mapmatch many trips at a time by dispatching the multiple trips to multiple spark slave machines. So It seems that it would not fix our latency issue, which needs to be reduced for EACH trip.

Would you have an idea or an advice to make the Barefoot work with a map of multiple countries (and ultimately with the entire world) ? must we use docker for each country ? a different server ?

Thank you very much for your help

smattheis commented 6 years ago

How much memory (RAM) do the machines have? It sounds like it's swapping to disk, which you should definitely avoid. The way you can avoid that for larger maps is a spatial sharding of your map region. To support this, the library allows to load only a region of the map that is stored in the database. You must then assign regions to different servers. I can give you some more hints on how to proceed on that, but please check first if the latency problem is caused by swapping.

(Sorry for the delay to answer your question, I was kind of offline for one month.)

inconnu26 commented 6 years ago

Thank you for your answer. Yes it is definitely a RAM issue. We're using 32GB RAM servers. And a matcher server of only France fills 18GB. So we we've started spatial sharding and it seems a good solution. Do you have other hints for such sharding ? Thank you !

smattheis commented 6 years ago

The basic idea is as follows:

Define the shards of the map with overlaps. The overlaps should be sufficient to cover a distance that can be traveled within one interval of your data, e.g., with 10 seconds maximum interval 1 kilometer overlap is more than sufficient (30 seconds should be 2-3 kilometers).
Instantiate the matcher (on each Spark machine exactly once) but load road data only for the specified tile. To do so, note that RoadReader has a constructor that takes a polygon: https://github.com/bmwcarit/barefoot/blob/master/src/main/java/com/bmwcarit/barefoot/road/RoadReader.java#L51
Send requests to only that machine where the first position of the trace is within the machine's tile (not within the overlap!!!). If at some point through the iteration of samples a position is outside the machine's tile. Send a new request to the machine where the position is again within the machine's tile and include the KState object with the request. To do so, KState can be serialized also into JSON (https://github.com/bmwcarit/barefoot/blob/master/src/main/java/com/bmwcarit/barefoot/markov/KState.java#L304) and deserialized from JSON (https://github.com/bmwcarit/barefoot/blob/master/src/main/java/com/bmwcarit/barefoot/matcher/MatcherKState.java#L48).

The technical difficulties with Spark are:

Coordinate machines to spread tiles such that all tiles are covered. (Maybe there is a Spark functionality for that. Otherwise, you can use something like Zookeeper.)
Send requests to certain machines based on the position of a sample. (Of course, you can hard-code the mapping because tiles are fixed, but Spark must somehow support this direct sending.)

bmwcarit / barefoot

Scaling up: Response time too long #107