Copy of #49 from @yit-b, which was accidentally closed when I unintentionally pushed this repository's main branch to the PR's origin (and now that the pull request is closed I am unable to push the correct ref). Original PR body:
It's not always desirable to set max_threads equal to the number of cpus on the host machine - especially in shared or containerized environments. Performance is often significantly worse if you have too many CPUs (see benchmarks below).
This PR gives users control over how many threads will be used for encoding by adding a max_threads argument accepted by Image.save(). Example:
save_test.py
import time
import pillow_avif
from PIL import Image
if __name__ == "__main__":
img = Image.open("tests/images/flower.jpg")
n_warmups = 1
n_iters = 10
for n_threads in range(0, 17):
for _ in range(n_warmups):
img.save("flower.avif", quality=80, max_threads=n_threads)
start = time.time()
for _ in range(n_iters):
img.save("flower.avif", quality=80, max_threads=n_threads)
print(
f"N threads: {n_threads}. Avg time: {((time.time() - start) / n_iters * 1000):.2f} ms/img"
)
Output:
N threads: 0. Avg time: 71.45 ms/img
N threads: 1. Avg time: 128.72 ms/img
N threads: 2. Avg time: 76.32 ms/img
N threads: 3. Avg time: 61.18 ms/img
N threads: 4. Avg time: 62.81 ms/img
N threads: 5. Avg time: 61.81 ms/img
N threads: 6. Avg time: 61.33 ms/img
N threads: 7. Avg time: 65.15 ms/img
N threads: 8. Avg time: 62.29 ms/img
N threads: 9. Avg time: 63.38 ms/img
N threads: 10. Avg time: 62.56 ms/img
N threads: 11. Avg time: 61.89 ms/img
N threads: 12. Avg time: 64.02 ms/img
N threads: 13. Avg time: 64.10 ms/img
N threads: 14. Avg time: 66.18 ms/img
N threads: 15. Avg time: 64.52 ms/img
N threads: 16. Avg time: 64.76 ms/img
The default behavior remains unchanged. It is 0 if not specified which is set to the cpu count.
Maybe this should be changed to a more reasonable default since performance seems the same with max_threads=0 (all CPUs) as it does with just 2 - probably because of contention? Maybe changing the default is outside the scope of this PR but I'd like to be able to tailor the parallelism to my compute environment.
Reproduce my tests:
conda activate foobar # This is some new empty conda environment
conda install -c conda-forge pillow 'libavif>=1.0.2' aom 'python=3.10.*' pillow
cd pillow-avif-plugin
pip install --no-deps .
python save_test.py
Copy of #49 from @yit-b, which was accidentally closed when I unintentionally pushed this repository's main branch to the PR's origin (and now that the pull request is closed I am unable to push the correct ref). Original PR body:
It's not always desirable to set max_threads equal to the number of cpus on the host machine - especially in shared or containerized environments. Performance is often significantly worse if you have too many CPUs (see benchmarks below).
This PR gives users control over how many threads will be used for encoding by adding a max_threads argument accepted by Image.save(). Example:
save_test.py
Output:
The default behavior remains unchanged. It is 0 if not specified which is set to the cpu count.
Maybe this should be changed to a more reasonable default since performance seems the same with max_threads=0 (all CPUs) as it does with just 2 - probably because of contention? Maybe changing the default is outside the scope of this PR but I'd like to be able to tailor the parallelism to my compute environment.
Reproduce my tests: