ANGSD / angsd

Program for analysing NGS data.
229 stars 50 forks source link

Segmentation fault (core dumped) #255

Open rillaxy opened 5 years ago

rillaxy commented 5 years ago

Hi. I am trying to run ANGSD on 2 bams with genomic data at 15x coverage/individual to calculate the theta.Unfortunately, when I run the command realSFS -P 24 outFold.saf.idx >outFold.sfs, I end up with a segmentation fault every time. Below is my error information and command . angsd -bam bam.filelist -out outFold -fold 1 -doSaf 1 -anc IRGSP-1.0_genome.fasta -minMapQ 30 -minQ 30 -P 24 -GL 1 realSFS -P 24 outFold.saf.idx >outFold.sfs -> Version of fname:outFold.saf.idx is:2 Segmentation fault (core dumped)

Why ? I also test -P 5 ,-P 4. It would be fantastic if you could help me with this! Thank you very much. I am looking forward to hearing from you.Best, Rilla.

stella-huynh commented 4 years ago

Hi, I am also running into the same trouble. I produced unfolded saf for two different species (BFAL and LAAL), using a third species as a ancestral genome (all genomes are aligned and indexed onto BFAL as reference). I produced SAF for either the whole genome or specific regions (autosomal vs. sez-linked scaffolds). Here is an example of command-line:

angsd   -ref BFAL_genome.fasta \
              -anc STAL_ANGSDgenome.fasta \
              -bam BFAL_15.bamlist \
              -rf BFAL_autosome_scafs_1e4_forANGSD.txt \
              -out BFAL_autosome_scafs_1e4 \
              -nThreads 10 \
              -remove_bads 1 \
              -uniqueOnly 1 \
              -only_proper_pairs 0 \
              -minMapQ 20 \
              -minQ 20 \
              -GL 1 \
              -doSaf 1 \
              -trim 0

But when running realSFS on the saf.idx files, I keep having this:

realSFS BFAL_autosome_scafs_1e4.saf.idx LAAL_autosome_scafs_1e4.saf.idx > autosomes_BFAL_LAAL_SFS.ml

-> Version of fname:BFAL_autosome_scafs_1e4.saf.idx is:2
Segmentation fault (core dumped)

I get the exact same result whether I do realSFS print , realSFS -P xx or realSFS -fold 1 with one or two saf.idx files. I tried also with the z-linked files (which are much smaller than the autosomal files) but it's the same. My saf files seems readable though (ie. no error message when using xxd -b).

I looked at some related issues but did not find anything that can help me to understand what's the problem here as I did not use -doThetas. Does it have something to do with my command to produce the saf files, or with realSFS/ANGSD tools? ANGSD was installed using conda and its version is 0.931 (htslib: 1.9).

Any advice are welcome, I really need help, thanks !

Stella

rillaxy commented 4 years ago

Hi, I am also running into the same trouble. I produced unfolded saf for two different species (BFAL and LAAL), using a third species as a ancestral genome (all genomes are aligned and indexed onto BFAL as reference). I produced SAF for either the whole genome or specific regions (autosomal vs. sez-linked scaffolds). Here is an example of command-line:

angsd   -ref BFAL_genome.fasta \
              -anc STAL_ANGSDgenome.fasta \
              -bam BFAL_15.bamlist \
              -rf BFAL_autosome_scafs_1e4_forANGSD.txt \
              -out BFAL_autosome_scafs_1e4 \
              -nThreads 10 \
              -remove_bads 1 \
              -uniqueOnly 1 \
              -only_proper_pairs 0 \
              -minMapQ 20 \
              -minQ 20 \
              -GL 1 \
              -doSaf 1 \
              -trim 0

But when running realSFS on the saf.idx files, I keep having this:

realSFS BFAL_autosome_scafs_1e4.saf.idx LAAL_autosome_scafs_1e4.saf.idx > autosomes_BFAL_LAAL_SFS.ml

-> Version of fname:BFAL_autosome_scafs_1e4.saf.idx is:2
Segmentation fault (core dumped)

I get the exact same result whether I do realSFS print , realSFS -P xx or realSFS -fold 1 with one or two saf.idx files. I tried also with the z-linked files (which are much smaller than the autosomal files) but it's the same. My saf files seems readable though (ie. no error message when using xxd -b).

I looked at some related issues but did not find anything that can help me to understand what's the problem here as I did not use -doThetas. Does it have something to do with my command to produce the saf files, or with realSFS/ANGSD tools? ANGSD was installed using conda and its version is 0.931 (htslib: 1.9).

Any advice are welcome, I really need help, thanks !

Stella

Hi, Stella. I have solved my problem with ANGSD version 0.921. May be you should have a try. Best, Rilla

stella-huynh commented 4 years ago

Hi Rilla,

I ran my whole pipeline and changing ANGSD version indeed solved my issues ! (although now I'm struggling with some RAM issues, but that should be solvable)

Thank you very much for your fast answer !!

Best, Stella

stella-huynh commented 4 years ago

Hi again,

I realized far late that the SFS produced by realSFS (from ANGSD version 0.921) were actually wrong. The SFS file contains 2n+1 values instead of 2n-1 ! I tried to run the example files from the angsd website, following all the instructions and then running the commands to get the 1D SFS. I still can't run realSFS as provided in ANGSD v.0.931, it throws me a segmentation fault without any other clues on what's the issue. I can run realSFS from previous ANGSD version (v.0.921, installed via conda) but then I get a SFS with 2n+1 and a very weird profile, completely different from the one shown on the website (which has 2n-1 values).

I think I should definitely use realSFS from ANGSD v.0.931, but I need some support to get what causes the segmentation fault, even on the example files...

I hope someone can help me about this, it's been months now that I am struggling with other analyses that depends on the SFS (which I think is due to my weird SFS file...)

Thanks ! Best, Stella

ANGSD commented 4 years ago

Dear Stella,

If you have n individuals, then your sfs will have 2n+1 entries. However two of these are the invariable categories to these are sometimes not shown or used. That is the first value (number of sites where all individuals are identical to the ancestral) and the last value(where all individuals are different from the ancestral). Hopefully this resolved the 2n+1 and 2n-1 issue.

With regards to the segfault. Couldnt this be a problem of too little memory?

Best

On 23 Dec 2019, at 07.27, stella-huynh notifications@github.com wrote:

Hi again,

I realized far late that the SFS produced by realSFS (from ANGSD version 0.921) were actually wrong. The SFS file contains 2n+1 values instead of 2n-1 ! I tried to run the example files from the angsd website, following all the instructions and then running the commands to get the 1D SFS. I still can't run realSFS as provided in ANGSD v.0.931, it throws me a segmentation fault without any other clues on what's the issue. I can run realSFS from previous ANGSD version (v.0.921, installed via conda) but then I get a SFS with 2n+1 and a very weird profile, completely different from the one shown on the website (which has 2n-1 values).

I think I should definitely use realSFS from ANGSD v.0.931, but I need some support to get what causes the segmentation fault, even on the example files...

I hope someone can help me about this, it's been months now that I am struggling with other analyses that depends on the SFS (which I think is due to my weird SFS file...)

Thanks ! Best, Stella

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/255?email_source=notifications&email_token=ABQOR3VCI32CRFTOSNFDGJTQ2BK3LA5CNFSM4IZ4RVTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHQKW6Q#issuecomment-568372090, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQOR3WYUWM2UZRG33LXVVLQ2BK3LANCNFSM4IZ4RVTA.

ANGSD commented 4 years ago

Hi, I am also running into the same trouble. I produced unfolded saf for two different species (BFAL and LAAL), using a third species as a ancestral genome (all genomes are aligned and indexed onto BFAL as reference). I produced SAF for either the whole genome or specific regions (autosomal vs. sez-linked scaffolds). Here is an example of command-line:

angsd   -ref BFAL_genome.fasta \
              -anc STAL_ANGSDgenome.fasta \
              -bam BFAL_15.bamlist \
              -rf BFAL_autosome_scafs_1e4_forANGSD.txt \
              -out BFAL_autosome_scafs_1e4 \
              -nThreads 10 \
              -remove_bads 1 \
              -uniqueOnly 1 \
              -only_proper_pairs 0 \
              -minMapQ 20 \
              -minQ 20 \
              -GL 1 \
              -doSaf 1 \
              -trim 0

But when running realSFS on the saf.idx files, I keep having this:

realSFS BFAL_autosome_scafs_1e4.saf.idx LAAL_autosome_scafs_1e4.saf.idx > autosomes_BFAL_LAAL_SFS.ml

-> Version of fname:BFAL_autosome_scafs_1e4.saf.idx is:2
Segmentation fault (core dumped)

I get the exact same result whether I do realSFS print , realSFS -P xx or realSFS -fold 1 with one or two saf.idx files. I tried also with the z-linked files (which are much smaller than the autosomal files) but it's the same. My saf files seems readable though (ie. no error message when using xxd -b). I looked at some related issues but did not find anything that can help me to understand what's the problem here as I did not use -doThetas. Does it have something to do with my command to produce the saf files, or with realSFS/ANGSD tools? ANGSD was installed using conda and its version is 0.931 (htslib: 1.9). Any advice are welcome, I really need help, thanks ! Stella

Hi, Stella. I have solved my problem with ANGSD version 0.921. May be you should have a try. Best, Rilla

Hi sorry for the late reply, I have not had much time to follow these boards. Do you have a dataset, that does not work with the most recent realSFS but works on the older realSFS? I possible I would like to obtain a copy of this dataset so I can fix the bug.

Thanks for using angsd and thanks for taking the time to report this possible bug.

Best

stella-huynh commented 4 years ago

Hi,

Thank you very much for your reply !

I see, I did not get that realSFS also provides such information in the SFS file... Thanks for the explanation !

About the segfault issue, I don't think it is related to memory. FWI, I am running the analyses on a cluster (PBS/Torque system), and as I don't have any root permission, I therefore used conda to install the latest version of ANGSD (v.0.931, build ha52163a_1), which had been installed along with htslib 1.9 (I think the latest version). Given the issue with realSFS, I had also downloaded a previous version of ANGSD (v.0.921), still using conda and installed it in a specific conda environment to avoid any conflicts between package versions.

Usually when my analysis required more than ~110-120% of memory usage (stated at the beginning of the analysis), it keeps running for a bit before throwing the segfault error message and stopping (with realSFS v.0.921). When I try running realSFS from ANGSD v.0.931, I immediately get the error message (as explained below), even when run it on sex chromosomes only (using -rf) that only uses ~10% of memory.

I gave a try using the example files provided on the "Quick Start" page and get the same issue. So first I downloaded the files as stated on that page:

wget http://popgen.dk/software/download/angsd/bams.tar.gz
tar xf bams.tar.gz
for i in bams/*.bam;do samtools index $i;done
ls bams/*.bam > bam.filelist

wget http://popgen.dk/software/download/angsd/hg19ancNoChr.fa.gz
mv hg19ancNoChr.fa.gz chimpHg19.fa.gz
zcat chimpHg19.fa.gz > chimpHg19.fa
samtools faidx chimpHg19.fa

Following the "SFS Estimation" page, I could generate the saf file:

angsd -bam bam.filelist -doSaf 1 -out small -anc chimpHg19.fa -GL 2 -P 4 -minMapQ 1 -minQ 20

But when using realSFS (v.0.931) to estimate the SFS, I get the following error:

realSFS small.saf.idx -maxIter 100 -P 4 >small.sfs
        -> Version of fname:small.saf.idx is:2
Segmentation fault (core dumped)

If I use realSFS from v.0.921, it works and I get these SFS values:

envs/angsd_0.921/bin/realSFS small.saf.idx -maxIter 100 -P 4 > small.sfs
cat small.sfs
1640236.291830 1202.814443 477.210914 509.836543 210.541355 282.550734 116.124024 542.369843 145.487639 255.228979 100.115206 236.408566 104.750803 208.727824 161.616455 123.483607 94.114589 68.274684 140.397595 119.197015 20935.457352

I have plotted these values using the R commands (scan + barplot), but the SFS profile looks different from the one shown on the webpage. I think that might have something to do with realSFS version ?

I attach you the log file from angsd -doSaf and realSFS. angsd_dosaf.log realSFS.log

Thanks again for you help !

Best, Stella

NicMAlexandre commented 4 years ago

Hello,

I am also running into this error using the most up-to-date version of angsd. For example, when I run the following:

realSFS ipo1.saf.idx ipo0.saf.idx -P 20 > ipo.ml

I get a segmentation fault error as follows:

-> Version of fname:ipo1.saf.idx is:2

/var/spool/slurmd/job5577940/slurm_script: line 14: 34660 Segmentation fault realSFS ipo1.saf.idx ipo0.saf.idx -P 20 > ipo1.ipo0.ml

My pos.gz file from the previous step is large and I have a nonzero idx file, so I'm not sure why this command wouldn't work. Any help would be appreciated.

ANGSD commented 2 years ago

Hi the saf and realSFS has been updated in the more recent versions so I assume this has been resolved. Also based on the log files from stella it looks like the program is not crashing.

Feel free to reopen if still relevant.

rillaxy commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我已经收到您的邮件,祝生活幸福!

ANGSD commented 2 years ago

Hi the saf and realSFS has been updated in the more recent versions so I assume this has been resolved. Also based on the log files from stella it looks like the program is not crashing.

Feel free to reopen if still relevant.

Morgan-McCarthy commented 1 year ago

Hey! Just to follow up, this issue still exists for version 0.940. I also tried with version 0.925 and the same segmentation fault error came up. Using version 0.921 still works.

isinaltinkaya commented 1 year ago

Hi @Morgan-McCarthy ,

Thanks for the update! I am reopening the issue as problem still persists in 0.940. We'll look into it and keep you posted.

Thanks for your patience and contribution!

Best, Isin

ThibauldMichel commented 1 year ago

Hello, unfortunately I have encountered the same issue with angsd version 0.933.

angsd -GL -vcf-pl "$path"/data/GVCF_SNPs_subset.vcf.gz -out "$path"/data/png_genotype_likelihood_subset

    -> angsd version: 0.933 (htslib: 1.9) build(May  6 2020 21:25:11)
    -> VCF still beta. Remember that
       1. indels are are discarded
       2. will use chrom, pos PL columns
       3. GL tags are interpreted as log10 and are scaled to ln (NOT USED)
       4. GP tags are interpreted directly as unscaled post probs (spec says phredscaled...) (NOT USED)
       5. FILTER column is currently NOT used (not sure what concensus is)
       6. -sites does NOT work with vcf input but -r does
       7. vcffilereading is still BETA, please report strange behaviour

    -> No indel tag in vcf/bcf file, will therefore not be able to filter out indels

/var/spool/slurm/job6226462/slurm_script: line 6: 2296313 Segmentation fault (core dumped) angsd -GL -vcf-pl "$path"/data/GVCF_SNPs_subset.vcf.gz -out "$path"/data/png_genotype_likelihood_subset

GVCF_SNPs_subset.vcf.gz

rillaxy commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我已经收到您的邮件,祝生活幸福!

rillaxy commented 9 months ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我已经收到您的邮件,祝生活幸福!