Multi-processing AB search

To perform MCTS-like methods, multi-processing is needed to improve search speed. It should support:

A producer consumer structure, to do BFS from a starting match situation.
A global decision maker agent, and local information extractor agent. Global agent receives information generated by local agent, and decide what action to search in the next. Local agent receives current match situation and requests (and optionally some information from global), and produce information (e.g. match score, action score, embeddings, or whole match) for global agent. We separate these into two agents because we can avoid serializing and unserializing match instance between main process and sub process (From testing on R9 7950X, serializing + unserializing a normal match with output 300K takes 0.01s, without this structure, its maximum throughput is 100step/s).
When search is done or exceed time limit, stop search and ask global agent to make decision.

2 may not needed, but we can simulate as if global agent receives mull match by returning full match by local agent.

LPSim / backend