SBU-BMI / wsinfer

🔥 🚀 Blazingly fast pipeline for patch-based classification in whole slide images
https://wsinfer.readthedocs.io
Apache License 2.0
55 stars 9 forks source link

set chunksize to some value greater than 1 in geojson conversion #204

Closed kaczmarj closed 4 months ago

kaczmarj commented 7 months ago
/[redacted]/envs/wsinfer/lib/python3.10/site-packages/wsinfer/write_geojson.py:112: TqdmWarning: Iterable length 4100 > 1000 but `chunksize` is not set. This may seriously degrade multiprocess performance. Set `chunksize=1` or more.
  process_map(func, csvs, max_workers=num_workers)
swaradgat19 commented 7 months ago

@kaczmarj I could take this issue up! What would be the best chunk size for our case? I was looking at this StackOverflow answer regarding chunksizes. Going through some more documentation to see what will be best for our case

kaczmarj commented 7 months ago

great! thanks for the link to the stack overflow answer. according to that, inter-process communication time is an important consideration when choosing the chunk size. the thing is, the duration of the conversion greatly outweighs the duration of process switching.

i wonder if chunksize=1 would be best actually...

could you test a few and see which gives the best timing results? perhaps test 1, 4, 10, and 20.

swaradgat19 commented 7 months ago

Sure. I will run some tests and will see what gives the best results.

swaradgat19 commented 7 months ago

I tried out different chunksizes. Here is what I observed:

Exp 1:

Exp 2:

Exp 3:

Exp 4:

I believe using chunksize = 1 seems to be the best option.

kaczmarj commented 4 months ago

thanks @swaradgat19