Open CodeBear801 opened 4 years ago
Draft streaming system diagram, related with https://github.com/Telenav/osrm-backend/issues/356#issuecomment-647797572
(click for large image)
moving event
, the ideal format for analysis and machine learning is
trace_id, userid, start_position, end_position, duration, distance, osrm_legs, avg_speed, osrm_distance, osrm_duration, osrm_edge_list, spatial_index_cell_list...
For how the output
to be used could go to https://github.com/Telenav/osrm-backend/issues/356#issuecomment-647797572
partial trip
continuously feed into system for analytics and training real time modelsProposal:
Another proposal:
Partial trip requirements
Condition of generating partial trip is adjustable
kafka
, kinesis
or managed service
like pub/sub
should all works, prefer to kafka
apache beam
or apache flink
or managed service
, I think either of them works for expected logic. Python
vs Scala
?!Analytic DB
is
Aggregator
Assign trip ID for event point
group by session window, trigger by number of gps counts or time
It generates a group of points with trip ID
Unique ID assigner
session
, we will try to group as many as gps trace into single trip until
redis
or adopt idea of snowflake
Draft flow:
Why separate Aggregator
and Trip Data Enhancer
historical trip gps trace
and apply Trip Data Enhancer
on them, it might be better to separate them as different componentNotes:
Apache Beam
or Apache Flink
works, each task is an interface represent actions(which usually a remote service calling) and streaming framework with chain them togethertripID
, default trigger
should workDon't use additional framework, just handling input and generate expected output based on 1 or 2 remote service, working on single node.
trip_id_123, point1, point2, point3, ....
trip_id_124, point1, point2, point3, ....
trip_id_125, point1, point2, point3, ....
inputRecord = Retrieve data from message queue
mapMatchedFuture = mapMatcher(inputRecord)
routeResponseFuture = router(mapMatchedFuture)
...
join result
publish result to message queue
Use streaming calculation framework to replace internal logic of V1, keep input and output part
on(loader)
.on(generateFeature1)
.on(generateFeature2)
...
.on(publish)
The reason of using streaming framework is
Aggregator mainly used to achieve real time goal, for the beginning we could used batched result for following components until we have clear understanding of this components.
subtask of https://github.com/Telenav/osrm-backend/issues/355
To be summary:
More info