MesserLab / SLiM

SLiM is a genetically explicit forward simulation software package for population genetics and evolutionary biology. It is highly flexible, with a built-in scripting language, and has a cross-platform graphical modeling environment called SLiMgui.
https://messerlab.org/slim/
GNU General Public License v3.0
160 stars 33 forks source link

Possible miscount in mutation reference tally with multiple subpopulations #403

Closed nobrien97 closed 11 months ago

nobrien97 commented 12 months ago

Hi Ben,

I've found a case where Population::_TallyMutationReferences_FAST_FromMutationRunUsage can error in simulations with multiple subpopulations which have shared mutations. The error I get is: ERROR (Population::_TallyMutationReferences_FAST_FromMutationRunUsage): (internal error) mutation refcount 4 != checkback 5. Note this error only appears in debug builds of SLiM where that refcount check is active. This might just be some edge-case in your debug check code rather than something wrong with the main tallying method, but thought I should report it just in case!

The error appears in this example run on the latest master HEAD (1d11792). This happens regardless of if parallel is on or not, and in both command line and SLiMgui. The simulation runs fine until cycle 1001, where p2 is populated with clones from p1. Upon trying to call mutationFrequenciesInGenomes() we get the error. This doesn't happen when p2 is empty. I've found that the checkback count is always greater than the refcount if that helps. Let me know if you need more info!

Cheers, Nick

bhaller commented 12 months ago

Thanks for the report; I'll look into it shortly!

bhaller commented 11 months ago

Hi @nobrien97. Upon investigation, I'm fairly sure that the bug here is actually in the checkback code, not the tallying code itself. I'm fixing the checkback code now, and we'll see if that fixes the problem. In the meantime, let me ask: did you notice a problem with your model's behavior, and decide to run it under DEBUG to see if that unearthed an issue, or did you run under DEBUG for other reasons and have this just pop out? In other words, besides this checkback error, do you have any reason to think that SLiM is doing anything wrong? Anyhow, we'll see what happens after I fix the checkback code. :->

bhaller commented 11 months ago

OK, I believe I have confirmed that the bug was in the checkback code, and is well understood. It has been fixed, and the test model now runs without errors. Note that this bug was never included in a release; it was introduced with changes for parallelization. @nobrien97, please do let me know if you see any further problems. I'm going to go run my big test suite now in DEBUG mode, which I do from time to time, but haven't done since 4.0.1 was released. :-> Thanks for the very useful report.

nobrien97 commented 11 months ago

Thanks Ben! I was running under debug to test a function I had written and the bug just happened to appear due to my testing model's crossing design. I'll let you know if I find anything else!