Closed markgoddard closed 1 year ago
Patch coverage is 98.03%
of modified lines.
Files Changed | Coverage |
---|---|
activestorage/reductionist.py | 90.00% |
activestorage/active.py | 100.00% |
activestorage/config.py | 100.00% |
:loudspeaker: Thoughts on this report? Let us know!.
@markgoddard this looks great, very many thanks! 🍺 A couple points:
- I am approving but can we delay the merge by just a few days, just had a call with @kchasapis and the guys at IME/RED are starting to understand the mechanics and use of
Active
- Konstantinos is now forking the repo and would probably appreciate it if there were no new arguments and functionality of Active that he'd have to make use of on RED; so I'll merge next week, should be good by then
No problem for me to wait here. I can always use the branch if necessary.
- would it make sense to start thinking of a configuration that contains, amongst others,
max_threads
?
The configuration of PyActiveStorage does need improvement. The current config.py method isn't really a good long-term solution since it's part of the Python module. This probably isn't the right place to discuss it though.
- I've used the
minio_start
call we have inmain
rather than yours in here (the one inmain
is still yours haha) - lemme know if the one here was better, so I can revert to it 👍
No problem - whatever works.
awesome, and yes, configuration needs discussion (even more so after RED integration), not here. Not a worry, won't dillydally anymore, will merge on Monday :+1:
cheers @markgoddard :beer:
Previously, each storage chunk was processed sequentially. This is functional, but not performant, particularly when using Reductionist (S3), due to the sequential round trip latencies of requests. This means we are not taking advantage of the concurrency provided by Reductionist.
This change switches to use a threadpool executor to process chunks. This is unlikely to be as performant as asyncio, but still provides a significant improvement without requiring an async runtime.
Some basic throttling of requests is provided by using a threadpool of a configurable size. This works well when using a single Active object, but may need refinement if multiple Active objects are in use at the same time to avoid overloading the Reductionist server.
In order to benefit from connection reuse, we switch to using a requests Session object that is shared by all threads in the pool.
This change also adds support for specifying a CA certificate to use when making requests to the Reductionist API.
Closes #121