ANGSD / angsd

Program for analysing NGS data.
231 stars 51 forks source link

4D SFS segmentation fault #561

Open carolindahms opened 1 year ago

carolindahms commented 1 year ago

Hi, I am trying to generate folded SFS with realSFS for 4 populations from recently generated saf files in ANGSD: realSFS chagos_GL1.saf.idx ningaloo_GL1.saf.idx indo_GL1.saf.idx south_gbr_GL1.saf.idx -fold 1 -P 4 > 4D.sfs

but similarly as reported in issue #255 I'm getting segmentation fault errors (please see attached file). I did manage to obtain the 2 and 3 dimensional SFS with the same saf files, so I am wondering what could be the issue here? I am using the latest ANGSD version and memory should be sufficient.

Thank you for any help! Carolin slurm-1052280.txt

nspope commented 1 year ago

Hi Carolin, can you try it without multithreading (remove the -P argument)?

Also I think this log is for the unfolded spectrum (because of fold: 0 in this line): -> args: tole:0.000000 nthreads:4 maxiter:100 nsites(block):0 start:(null) chr:(null) start:-1 stop:-1 fstout:(null) oldout:0 seed:-1 bootstrap:0 resample_chr:0 whichFst:0 fold:0 ref:(null) anc:(null) (or else something strange is going on)

Assuming that's the case, is it crapping out at the same place with -fold 1?

carolindahms commented 1 year ago

Cheers for getting back so quick!

Yes you're right, that was the log for the unfolded SFS as I tried both, with the same error however. Removing -P doesn't do it either, and running it with different populations to check for errors within saf files also doesn't help. Could you think of anything else worth trying?

Thanks!!

nspope commented 1 year ago

So to summarise it fails for 4 populations regardless of options and inputs? What's the version of angsd you're using?

carolindahms commented 1 year ago

Yes, correct. I am using version 0.940

On Wed, 1 Mar 2023, 10:46 nspope, @.***> wrote:

So to summarise it fails for 4 populations regardless of options and inputs? What's the version of angsd you're using?

— Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/561#issuecomment-1449248886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATXXCJGIEEYBRQM3ONOU6KTWZ2Z63ANCNFSM6AAAAAAVKF5TSI . You are receiving this because you authored the thread.Message ID: @.***>

carolindahms commented 1 year ago

Hi, so I noticed realSFS does the 4D SFS when using only one chromosome or a subset of sites. Also, for some population combinations it seems to work (but not for the populations I'm interested in), as I tried using pops with less data - but I'm still in the dark why this is happening as the working pop combinations seemed quite random. I could possibly try summing the SFS of each chromosome, but I fear it would produce less reliable results with my sparse data,

nspope commented 1 year ago

Hm, strange. Are you able to realSFS print file.saf.idx for each of the populations, without getting a segfault? I'm travelling right now but will try to look at this in a week or thereabouts.

carolindahms commented 1 year ago

Yes, no issues with printing the files. Cheers for the help!

On Mon, 6 Mar 2023, 12:57 nspope, @.***> wrote:

Hm, strange. Are you able to realSFS print file.saf.idx for each of the populations, without getting a segfault? I'm travelling right now but will try to look at this in a week or thereabouts.

— Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/561#issuecomment-1455449158, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATXXCJBYD5S35EK3Q45RPBTW2VVEHANCNFSM6AAAAAAVKF5TSI . You are receiving this because you authored the thread.Message ID: @.***>

carolindahms commented 2 months ago

Hi! I still haven't figured out the issue with the same segmentation fault. I can print the files and generate pairwise 2Dsfs, but not 4 or 5 dimensional spectra. the error:

~/Softwares/angsd/misc/realSFS chagos_GL1.saf.idx ningaloo_GL1.saf.idx indo_GL1.saf.idx chesterfield_GL1.saf.idx north_gbr_GL1.saf.idx -fold 1 [persaf::persaf_init] Version of chagos_GL1.saf.idx is 3 [persaf::persaf_init] Assuming .saf.gz file is chagos_GL1.saf.gz [persaf::persaf_init] Assuming .saf.pos.gz file is chagos_GL1.saf.pos.gz [persaf::persaf_init] Version of ningaloo_GL1.saf.idx is 3 [persaf::persaf_init] Assuming .saf.gz file is ningaloo_GL1.saf.gz [persaf::persaf_init] Assuming .saf.pos.gz file is ningaloo_GL1.saf.pos.gz [persaf::persaf_init] Version of indo_GL1.saf.idx is 3 [persaf::persaf_init] Assuming .saf.gz file is indo_GL1.saf.gz [persaf::persaf_init] Assuming .saf.pos.gz file is indo_GL1.saf.pos.gz [persaf::persaf_init] Version of chesterfield_GL1.saf.idx is 3 [persaf::persaf_init] Assuming .saf.gz file is chesterfield_GL1.saf.gz [persaf::persaf_init] Assuming .saf.pos.gz file is chesterfield_GL1.saf.pos.gz [persaf::persaf_init] Version of north_gbr_GL1.saf.idx is 3 [persaf::persaf_init] Assuming .saf.gz file is north_gbr_GL1.saf.gz [persaf::persaf_init] Assuming .saf.pos.gz file is north_gbr_GL1.saf.pos.gz -> args: tole:0.000000 nthreads:4 maxiter:100 nsites(block):0 start:(null) chr:(null) start:-1 stop:-1 fstout:(null) oldout:0 seed:-1 bootstrap:0 resample_chr:0 whichFst:0 fold:1 ref:(null) anc:(null) [main] Multi SFS is 'still' under development. Please report strange behaviour. [main] You are printing the optimized SFS to the terminal-- consider dumping it into a file, e.g.: './realSFS chagos_GL1.saf.idx ningaloo_GL1.saf.idx indo_GL1.saf.idx chesterfield_GL1.saf.idx north_gbr_GL1.saf.idx -fold 1 >sfs.mle.txt' -> The choice of -nSites will require atleast: 702.532349 megabyte memory, that is at least: 2.21% of total memory -> dim(chagos_GL1.saf.idx):45 -> dim(ningaloo_GL1.saf.idx):47 -> dim(indo_GL1.saf.idx):47 -> dim(chesterfield_GL1.saf.idx):81 -> dim(north_gbr_GL1.saf.idx):39 -> Dimension of parameter space: 314020395 -> Done reading data from chromosome will prepare next chromosome -> Is in multi sfs, will now read data from chr:Stacks_loci -> hello Im the master merge part of realSFS. and I'll now do a tripple bypass to find intersect -> 1) Will set iter according to chooseChr and start and stop, and possibly using -sites -> Sites to keep[Stacks_loci] from pop0: 3102205 -> Sites to keep[Stacks_loci] from pop1: 3102205 -> Sites to keep[Stacks_loci] from pop2: 3102205 -> Sites to keep[Stacks_loci] from pop3: 3102205 -> Sites to keep[Stacks_loci] from pop4: 3102205 -> [readdata] lastread:3102205 posi:0 -> Comparing positions: 1 with 0 has:3102205 -> Comparing positions: 2 with 0 has:3102205 -> Comparing positions: 3 with 0 has:3102205 -> Comparing positions: 4 with 0 has:3102205 -> Only read nSites: 3102205 will therefore prepare next chromosome (or exit) -> Done reading data from chromosome will prepare next chromosome -> Will run optimization on nSites: 3102205 Segmentation fault (core dumped)

I have attached the data if anyone would be able to check what is going on. I would be very thankful!