Hi everyone.
I'm working with pywsd, which I find to be a very helpful library.
The issue I'm having is loading time and memory usage.
I'm using mostly adapted_lesk. On my computer it takes around 2.5 seconds to warm up the lib and then it uses more than a gigabyte of ram.
I've checked that the used pickle protocol is 2. When I tried using version 5, the loading time of this file drops from 1.58 seconds to 0.98, which already is a significant drop.
But then I tested json and the improvement is even better. Namely it only takes 0.8 seconds to load on my computer, which is almost a 50% drop.
Moreover, the memory usage also drops significantly (similarly it's around 50%).
So I think replacing the pickle/pandas with json/dict could be an improvement.
Another thing I would like to ask is: is it necessary to load all the modules in the __init__.py file? If I use only adapted_lesk, do the other modules have to be loaded? Specifically pywsd.similarity loads some stuff, which also takes time and uses memory.
I wonder what you think of this simple change (i.e. replacing pickle with json)? I'd be happy to work on it.
Hi everyone. I'm working with
pywsd
, which I find to be a very helpful library.The issue I'm having is loading time and memory usage. I'm using mostly
adapted_lesk
. On my computer it takes around 2.5 seconds to warm up the lib and then it uses more than a gigabyte of ram.I've made some experiments and I've noticed that the longest thing when loading the library is pd.read_pickle(signatures_picklefile).
I've checked that the used pickle protocol is 2. When I tried using version 5, the loading time of this file drops from 1.58 seconds to 0.98, which already is a significant drop.
But then I tested
json
and the improvement is even better. Namely it only takes 0.8 seconds to load on my computer, which is almost a 50% drop.Moreover, the memory usage also drops significantly (similarly it's around 50%).
So I think replacing the pickle/pandas with json/dict could be an improvement.
Another thing I would like to ask is: is it necessary to load all the modules in the
__init__.py
file? If I use onlyadapted_lesk
, do the other modules have to be loaded? Specificallypywsd.similarity
loads some stuff, which also takes time and uses memory.I wonder what you think of this simple change (i.e. replacing pickle with json)? I'd be happy to work on it.