Supporting Parallelism - Githubissues

sirbastiano commented 2 months ago

I think we can improve the processing speed by a lot by using the joblib package: https://joblib.readthedocs.io/en/latest/parallel.html

As we know, huffman decoding is hard to parallelise but we can parallelise the records.

Joblib is able to release the GIL of python allowing for multi-threading the decoding.

sirbastiano commented 1 month ago

I will make a PR as soon as I finish my paper

avalentino commented 1 month ago

Thanks @sirbastiano and sorry for the late reply.

As a general comment I would like to say that I'm a little bit hesitant to introduce officially this feature at this stage of the project because the core API is still not fully consolidated and we still miss some features (like e.g. timeline management) that could have impact on the parallel decoding.

Moreover the choice of the tool to implement the parallel version (joblib vs desk vs multiprocessing vs asynchronous stuff, etc.) is also a delicate matter that is probably a little bit premature to address a this stage.

If it is OK for you I would like to have an initial parallel implementation in an example folder (so not in the package for the moment). This would for sure help to understand better the needs of a parallel implementation and improve the core API accordingly. Moreover, interested users can just grab the example implementation and use it as needed.

What do you think? Is it a feasible approach for you?

sirbastiano commented 1 month ago

Yeah, I do often forget that the codebase needs still some labor limae.

It is not a problem keeping it as a separate plugin. So, the users can choose to pick it or not. I will try my best to implement this feature since I believe can change the execution of batch processing.

avalentino / s1isp

Supporting Parallelism #9