Closed Joseph94m closed 5 years ago
@Joseph94m thanks for your issue, we'll discuss it :) Do you think it's related to #264 , #40 , #233 ?
@faneshion this might also affect our 2.0 generator: https://github.com/faneshion/MatchZoo/blob/03e9bc0ac77edd5f299801511f550e25de965f7a/matchzoo/generators/point_generator.py#L64-L97
Yes, I think the issue is related to the others because they also seem to be about memory leaks.
It is also my case where when I increase the size of the training corpus as well as the size of the relation files, the program gets really slow: for example, with a 500 mb corpus, and a relation file of 1GB (100k queries), the training time is abhorrently long. Maybe I misunderstood the configuration parameters? I tried reducing the display_interval (which is the steps in epoch) from 10 to 1. I also reduced the query_per_iter to 10 from 50. But it's still taking a long time.
In addition to that, before I made the change to the make_pair_iter, my program was running out of memory at around the 70th iteration, which lead me to assume that there was a memory leak somewhere since iterations are independent, and that is why I moved the initializtion of pair_list to make it local (inside of the while).
@Joseph94m Yes, this is a critical issue in MatchZoo. I'll discuss with other people about it.
Probably we'll first fix it in branch 2.0, then master.
Any update on this issue?
@daltonj We've impelmented PairGenerator, PointGenerator and ListGenerator under branch 2.0. We're still working on the integration test :)
Closed due to inactivity.
Hey,
I think the pair_list=[] should be moved under the While True statement in make_pair_iter function in pair_generator.
Why? Because we want pair_list to be reset for each time we need to yield training pairs to the model. Otherwise, pair_list will keep growing in memory as make_pair_iter never actually returns since it uses yield. This will ultimately yield to the list growing and causing a memory problem.