Closed PaulLerner closed 1 year ago
Dear Paul,
The issue is probably due to numba
(I cannot do anything about that) and a forced conversion of the strings used as ids to numba.types.unicode_type
that I introduced to avoid errors when I implemented the fusion algorithms.
I have tested a snippet similar to yours (I do not save the Python dict in memory) with and without the conversion.
Memory usage went down from 2.42 GB to 1.59 GB (including ranx
import).
The Python dict alone is around 1.15 GB.
I'll try to remove the forced conversion without breaking the fusion algorithms and get back to you.
Thanks for pointing it out!
Fixed in v0.3.15
.
Hi Elias,
Is your feature request related to a problem? Please describe. I've noticed that
Run
(and I guess alsoQrels
) consume a lot of memory (RAM) compared to standard pythondict
, e.g. a few GB instead of a few 100s of MB. This gets problematic for somewhat large datasets (e.g. 1M queries)Describe the solution you'd like I guess it's related to Numba representation? I've no clue on how to make it more efficient, sorry :)
Reproduce Just open your system monitor and see how the memory grows.
Best,
Paul