Synchronization Issue in ML-Agents Decision Timing

Is your feature request related to a problem? Please describe.
Yes, the problem arises from the desynchronization between the decision time (when Python produces an action) and Academy.EnvironmentStep() (when Unity applies actions). This desync can lead to duplicate actions being taken or duplicate states being read, which is frustrating when trying to maintain consistency and synchronization during training. The issue seems to stem from the non-blocking websocket communication between Unity and Python.

Describe the solution you'd like
I’d like a blocking mechanism in Unity that pauses rendering until the next action from Python is available. This would ensure that Academy.EnvironmentStep() only processes actions and states in sync with the decision-making process, preventing duplicates.

Describe alternatives you've considered

Increasing the frequency of EnvironmentStep() updates to match decision-making timing, but this doesn't fully address the synchronization issue.
Adjusting Time.timeScale and FixedUpdate() to better align the two processes, though this approach is limited by the non-blocking nature of the websocket mechanism.

Additional context
This issue is likely caused by the asynchronous nature of websocket-based communication. A blocking mechanism could help bridge the gap between Unity and Python processes. Feedback or alternative solutions from the community would be greatly appreciated.

Unity-Technologies / ml-agents

Synchronization Issue in ML-Agents Decision Timing #6175