RubixML / ML

A high-level machine learning and deep learning library for the PHP language.
https://rubixml.com
MIT License
2k stars 177 forks source link

Swoole Backend #312

Closed mcharytoniuk closed 5 months ago

mcharytoniuk commented 6 months ago

The $timeout parameter means the total timeout for all the tasks in the current queue. By default it's set to -1 (no timeout)

Internally, I used batch (should also work with OpenSwoole: https://openswoole.com/docs/modules/swoole-coroutine-batch). batch runs the tasks concurrently, but also preserves the order of results, so I added that to the test.

I also added swoole to the list of extensions loaded in the CI.

If you want me to change anything, let me know :)

github-actions[bot] commented 6 months ago

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

mcharytoniuk commented 6 months ago

I have read the CLA Document and I hereby sign the CLA

andrewdalpino commented 5 months ago

Hey @mcharytoniuk I'm copying this from our private convo because I think it's useful to capture here ...

I like how you made that data provider for backends, really nice touch

I think the PR is fantastic, but what I wanted to discuss with you was the differences between the Process and Coroutine modules and if it's necessary to have both (i.e. there's situations where either is better than the other).

In terms of Serial vs the Swoole backend, that sounds pretty good. I would expect a parallel backend to be faster on multicore systems. Do you have data on Swoole vs the Amp backend?

Also regarding the serializers, we could potentially use the Serializer abstraction already implemented here https://github.com/RubixML/ML/tree/master/src/Serializers. And we also have the igbinary one in Extras here https://github.com/RubixML/Extras/blob/master/src/Serializers/Igbinary.php. Hopefully the API will eb compatible.

P.S. Stan seems to be complaining about some of the new code.

andrewdalpino commented 5 months ago

Also from our discussion @mcharytoniuk, it seems like after seeing the benchmarks that the Coroutine module may not be executing the tasks in parallel under the hood.

andrewdalpino commented 5 months ago

Hey @mcharytoniuk just a reminder to target the 2.5 branch with this code and to update the CHANGELOG! Looking fantastic, excited to release this!

https://github.com/RubixML/ML/blob/2.5/CHANGELOG.md

mcharytoniuk commented 5 months ago

@andrewdalpino PR is done (in a sense that it's working, and I don't think I can add anything more valuable in terms of Swoole backend) :D

There are few more things to consider, however:

  1. I added BackendProviderTrait to feed all the compatible tests and benchmarks with all the backends, and some tests are failing with Amp backend (Serialization of Closure is not allowed). I just commented out Amp in BackendProviderTrait, because I didn't want to fix that in this PR (Amp backend is unchanged).
  2. Serializer interfaces are slower than just using igbinary_serialize, igbinary_unserialize (about ~100 ms difference in benchmarks) - I think that's due to intermediary objects like Encoding and GC, string copying overhead (it adds up), so I just check for igbinary in the Swoole backend. The Serializer interface is not exactly compatible with this use case (IPC).
  3. Automated tests in GitHub CI didn't run, so I don't know if something will come up for PHP 7. I tested everything under PHP 8.2 I have on my device
andrewdalpino commented 5 months ago

Sweet @mcharytoniuk well give this a look soon! Thank you for the details, they help alot