Closed roblanf closed 10 years ago
My guess is that it is the new algorithm loading all schemes. I doubt this is the database (though I guess it is possible), and if the new algorithm needs all schemes, then we need all subsets too, which means weakref dictionary won't do us any good.
This issue is now assumed fixed in the following commit (we don't have the original dataset so can't actually reproduce the error to check): https://github.com/brettc/partitionfinder/commit/4d94e4088e0d06ccfe08e5b135fe5ff0ea921a94
In short, I went with solution 2, and analyse a maximum of 10K schemes at once.
If this doesn't fix the issue, then we will have to look into the database more.
A very helpful user of the develop branch said this:
"That exception [to otherwise good performance] is a PF-develop run (search=greedy, MrBayes-specific models) I attempted for comparison with a PF-1.1.1 run of the same parameters. As I have mentioned previously, the PF-1.1.1 search=greedy runs were going very slowly, so I was hoping the PF-develop run would be fast or even finish before the previously-started PF-1.1.1 run.
In the end, the PF-1.1.1 run took >25 days (28/Dec – 25/Jan) to finish. The PF-develop run started very quickly and progressed to ~50% in 4-5 days, after which progress pretty much stopped. I let it run for a few more days and thought it had locked up so restarted. After the restart I let it run for another 5-6 days with little progress, after which I needed the computer for other analyses and killed the job. Interestingly, the computer had written a >40GB swap trying to deal with this analysis."
Need to figure this out and fix it. My suspicion is that the current method of loading up ALL the schemes at once is no good (this is the big change in the greedy algorithm from 1.1.1). But it could also be something to do with the databasing - the greedy algorithm uses a lot of subsets, and I wonder if DB.py is getting overloaded (if so, what to do?). Third option - it's because we abandoned the weakref dictionary, and we're just keeping too many subsets around.
2 things to try: