Closed Superskyyy closed 1 year ago
Things I've tested:
Pure multiprocessing with native queue - very low throughout.
Redis streams + multiprocessing - fast but complex, it cannot be scaled or reduced easily.
Redis task queue - high Redis overhead, weird to do stream processing.
Current plan for Log data:
Source (OAP) -> N*gRPC(Ingestors) -> In Stream(Redis)-> Ray Actor(Stream Consumers) -> Maskers(Preprocessors) -> Ready Stream(Redis)-> ML(Learners) -> Out Stream(Redis)-> Ray Actor (Exporters) -> Destination (OAP)
I'll complete a prototype to showcase the flow over this weekend.
Closing in favor of movement to Flink. New PoC is implemented.
I've been experimenting with the pipeline using native multiprocessing and Redis Streams/RQ recently, and it quickly becomes messy when we spawn many processes.
So I'm evaluating Ray as the backend engine to orchestrate the streaming processing jobs while supporting batch learning that anomaly detection might utilize. By far, it looks promising.
The main benefit of Ray to us includes:
@Liangshumin @Fengrui-Liu FYI, there'll be some changes to the existing designs that I communicated over chat, please pay attention to the algorithm training part as Ray offers many out-of-the-box ML features.