eonum / medtextcollector

Scripts for the collection of online medical texts and definitions
MIT License
1 stars 0 forks source link

reduce memory usage #11

Open tschimbr opened 7 years ago

tschimbr commented 7 years ago

The current memory usage tends to get very high after some time. Ist there anything that could be garbage collected?

asittampalam commented 7 years ago

Without having double checked it, I suspect that it might be, because I'm storing a little bit of data about each visited page in memory (e.g. the hash value of its content). It might be that this, when accumulated, turns out to be a lot of data. I could solve that problem by bucketing all the information into domain-buckets (e.g. "www.eonum.ch/about" etc. will be in the bucket "www.eonum.ch") and then only holding the currently used bucket in memory and storing everything else on disk. I think this will reduce the memory usage but increase disk access (which shouldn't be a huge problem).

asittampalam commented 7 years ago

I just remembered that I'm caching requests as well. I have to check if I'm already deleting them after they are used or not. If I'm not, that'll be the main problem.