Closed ytreister closed 4 years ago
You are correct, very few plugins have asyncio support right now. This is a work in progress, and I am more than happy to accept PRs to help with that, along with documentation improvements.
I think at the minimum the documentation should explain this for WorkerPlugins and it should maybe make it clear for the section where it talks about converting worker plugins from v2-> v3.
I created two PRs exif and opswat...
Describe the bug Workers need to await a coroutine in order to run in parallel.
To Reproduce I created a demo to illustrate what I am talking about: https://github.com/ytreister/stoq/tree/workers_in_parallel/demo
You can run
scan.py
which show the following: All workers await a coroutine: (This is what I want it to do)All workers do not await a coroutine: (This is not how I wanted it to work)
Notice how in the first each worker runs in asynchronously and in the second they run serially.
Expected behavior I think the stoQ framework works as expected, but maybe the documentation needs some more explaining so that users can take full advantage of the async feature. I do not think any of the public stoq plugins are written so that they can be run asyncronously.
Client:
Explanation Once I converted and started running my plugins in stoQ 3.x, I inspected the logs and noticed that the plugins did not seem to run in parallel during a given round. My OPSWAT metadefender plugin was the dead giveaway because it would take up to 1 minute per file.
I created a demo to illustrate what happens when the plugins await a co-routine that takes some time to execute versus call a regular function that takes some time to execute. As I expected, when awaiting a coroutine (in my demo this happens when I pass
b'asyncio'
as the payload) the worker plugins execute asynchronously. When my worker plugins do not await a co-routine, they basically run in serial.All of the stoq-plugins-public seem to not await a co-routine. For example opswat.py public plugin should maybe use one of the techniques described here: https://stackoverflow.com/questions/22190403/how-could-i-use-requests-in-asyncio
so that the framework does not have to wait for the scan to complete before other worker plugins begin their scan.
Another example might be a plugin that runs some local command such as
exif.py
ortrid.py
. These both call subprocess, so perhaps something from here: https://docs.python.org/3/library/asyncio-subprocess.htmlIt might not take very long for these function to execute, but I thought one of the main reason to us asyncio is so that we do not have to wait for other unrelated workers to run.
For me, my plugins either
I think there should be at least guidance for how to setup each of the above types of workers so that they can run asynchronously