alanrogers / legofit

Estimate population history parameters from site pattern frequencies.
Other
12 stars 1 forks source link

resid assertion failed #17

Closed janxkoci closed 6 months ago

janxkoci commented 6 months ago

Hi Alan,

I'm running resid on simulated data (from msprime), to get relative site pattern frequencies of simulations under our models for comparison with observed data, but I'm getting the following error for two of the opf files:

$ resid Cs_data_opf.txt Cs_4_data_opf.txt > /dev/null # get only stderr
resid: resid.c:654: main: Assertion `Dbl_near(mat[i * nDataFiles + j], 0.0)' failed.
Aborted (core dumped)

Removing one of the files makes resid finish fine, same for using them individually. Only when they are used together they crash resid - no matter if just the two files or in a larger batch of all simulated bootstraps for a given model.

I cannot spot anything obviously wrong with the files.

One of the files is simulated using values of point estimates from legofit, while the other file is 4th simulation replicate where I sampled values from confidence intervals returned by legofit (to emulate bootstrap replicates). All other such replicates are fine, as well as other models and previous runs of the same pipeline.

Cs_data_opf.txt (simulated using values of point estimates) Cs_4_data_opf.txt (simulated using values sampled from confidence intervals, 4th replicate)

My resid is version 2.3.21-16-gc04221f6 - it may not be the latest, but I use this version for the entire project, and it was without issues until now. Also, the OS on the particular cluster had been recently updated to Debian 12 - I don't know if this is relevant, especially if my other runs of resid are fine.

Can you please help me figure out where the problem is?

In the mean time, is it okay to produce the relative frequencies separately for these files and then combine them into one file for downstream analysis, or do the frequencies change based on which files are included? (From a brief eyeballing of the outputs this does not seem to be the case, so my hunch is I can combine them.)

alanrogers commented 6 months ago

I found the problem and fixed it. I also merged the devp branch into the master branch, so it's fixed both places.

The old code was reading outside the bounds of an array, and the resulting garbage resulted in an assertion error. Thanks for bringing this to my attention.

On Tue, May 21, 2024 at 2:17 AM Jeňa Kočí @.***> wrote:

Hi Alan,

I'm running resid on simulated data (from msprime), to get relative site pattern frequencies of simulations under our models for comparison with observed data, but I'm getting the following error for two of the opf files:

$ resid Cs_data_opf.txt Cs_4_data_opf.txt > /dev/null # get only stderr resid: resid.c:654: main: Assertion `Dbl_near(mat[i * nDataFiles + j], 0.0)' failed.Aborted (core dumped)

Removing one of the files makes resid finish fine, same for using them individually. Only when they are used together they crash resid - no matter if just the two files or in a larger batch of all simulated bootstraps for a given model.

I cannot spot anything obviously wrong with the files.

One of the files is simulated using values of point estimates from legofit, while the other file is 4th simulation replicate where I sampled values from confidence intervals returned by legofit (to emulate bootstrap replicates). All other such replicates are fine, as well as other models and previous runs of the same pipeline.

Cs_data_opf.txt https://github.com/alanrogers/legofit/files/15387340/Cs_data_opf.txt (simulated using values of point estimates) Cs_4_data_opf.txt https://github.com/alanrogers/legofit/files/15387338/Cs_4_data_opf.txt (simulated using values sampled from confidence intervals, 4th replicate)

My resid is version 2.3.21-16-gc04221f6 - it may not be the latest, but I use this version for the entire project, and it was without issues until now. Also, the OS on the particular cluster had been recently updated to Debian 12 - I don't know if this is relevant, especially if my other runs of resid are fine.

Can you please help me figure out where the problem is?

In the mean time, is it okay to produce the relative frequencies separately for these files and then combine them into one file for downstream analysis, or do the frequencies change based on which files are included? (From a brief eyeballing of the outputs this does not seem to be the case, so my hunch is I can combine them.)

— Reply to this email directly, view it on GitHub https://github.com/alanrogers/legofit/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRR6ST6SYE37GDXHUHAYE3ZDMGI5AVCNFSM6AAAAABIBHKJP6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYDONZZGYYDENI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

janxkoci commented 6 months ago

Thanks Alan, I'll test it later today and will let you know how it works.

janxkoci commented 6 months ago

OK it works well now, thanks again! :relaxed:

alanrogers commented 6 months ago

I just made a few tweaks to the devlp branch, merged that into master, and bumped the version number to 2.3.22. I forgot to mention yesterday that the Makefile now specifies the clang compiler rather than gcc.

On Wed, May 22, 2024 at 12:42 AM Jeňa Kočí @.***> wrote:

OK it works well now, thanks again! ☺️

— Reply to this email directly, view it on GitHub https://github.com/alanrogers/legofit/issues/17#issuecomment-2124089495, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRR6SU5FWHUIHDRRBHFHCDZDRD5ZAVCNFSM6AAAAABIBHKJP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUGA4DSNBZGU . You are receiving this because you commented.Message ID: @.***>

janxkoci commented 6 months ago

Oh yeah, I noticed I had to module add clang, I thought it was a Debian 12 thing..