Closed leandervaneekelen closed 3 weeks ago
Totals | |
---|---|
Change from base Build 11386276311: | -0.02% |
Covered Lines: | 2219 |
Relevant Lines: | 3056 |
Hi Leander,
This is amazing, very nice speedup. Thank you so much! I think there are no hurdles for moving to shapely2.0 and i think it should already work.
Hi @martvanrijthoven,
I have some huge XML files in which I store my slide-level detection inference (>100.000 cells/detections). I noticed that loading these from ASAP-formatted XMLs can take a very long time and that loading them from JSONs takes about equally as long (I was under the impression that converting the XMLs to JSON via
scripts/convert_asapxml_to_json.py
would result in a speed-up, but perhaps not at the scales I am working at).Here is a code snippet for making a large annotations file:
I did some profiling and found out that during the initiation of the WholeSlideAnnotation object a significant time is spent on inserting points into the rtree used for
WholeSlideAnnotation.select_annotation
calls:With some digging I found out that you can instantiate an
rtree.index
from a stream and that this offers a significant speed-up:Now, the biggest timesink is initiating GEOS objects in Shapely.
Some benchmarking with timeit confirms the optimization boost:
Cool crisp 40% speedup :sunglasses:
Let me know what you think! Btw, I read that shapely 2.0 contains a lot of optimizations, so we can probably get this time down even further. Do you know what would be the biggest hurdle for moving to shapely 2.0 in WSD? Is this even feasible/desirable?