arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
315 stars 119 forks source link

Segmentation fault #181

Open Sebastian-D opened 7 years ago

Sebastian-D commented 7 years ago

Hi,

I am using lumpy 0.2.12 on 17 samples and getting Segmentation fault error resulting in core dumps on all of them. The output looks reasonable for chromosomes 1:22,X,Y so it is not a huge problem but I thought perhaps you would know what is wrong. They all fail at the same stage.

My output: issues.lumpy.txt

Do you know what is causing this? Anything I can try? Can I trust the output for 1:22,X,Y chromosomes?

/Sebastian

ryanlayer commented 7 years ago

Could you be running out of disk space? Lumpy puts all inter-chrom events into a temp bam file that is then processed after all of the intra-chrom events. The first step in the process is to sort that bam, which can a lot of disk space. From the output it appears that things are getting through the intra-chrom stuff just fine, then tripping up in the inter-chrom stage. Does your VCF have any inter-chrom events?

On Fri, Mar 24, 2017 at 4:43 AM, Sebastian-D notifications@github.com wrote:

Hi,

I am using lumpy 0.2.12 on 17 samples and getting Segmentation fault error resulting in core dumps on all of them. The output looks reasonable for chromosomes 1:22,X,Y so it is not a huge problem but I thought perhaps you would know what is wrong. They all fail at the same stage.

My output: issues.lumpy.txt https://github.com/arq5x/lumpy-sv/files/867569/issues.lumpy.txt

Do you know what is causing this? Anything I can try? Can I trust the output for 1:22,X,Y chromosomes?

/Sebastian

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/181, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUdIO1-bW6aPHZ5puEe8OOfWhmhupks5ro55OgaJpZM4MoDvp .

-- Ryan Layer

Sebastian-D commented 7 years ago

Thanks for your swift reply Ryan.

Are those temp files written to the temporary lumpy directory where it is executed by default? In that case it cannot be disk space as I have 2 TB to spare.

I checked one of my output VCFs and after all the intra-chromosomal events it contained inter-chromosomal events all the way into the GL### named contigs. Is this how it should be? The VCF file sorted by ID column?

Sebastian-D commented 7 years ago

Some additional thoughts:

After googling around a bit I found that you recommend running BWA MEM without the -M option. Is this a dealbreaker for running lumpy on the bam file? How much does this typically influence the output? Could this be a part of the error message I am getting?

Secondly, do you recommend running gatk pre-processing, such as RealignerTargetCreator and IndelRealigner, before using lumpy? Does this influence results? If it does; have you checked how much and do you recommend for/against it?

ryanlayer commented 7 years ago

My next guess is that you are running out of RAM. How many samples are you considering at once?

On Tue, Mar 28, 2017 at 3:55 AM, Sebastian-D notifications@github.com wrote:

Some additional thoughts:

After googling around a bit I found that you recommend running BWA MEM without the -M option. Is this a dealbreaker for running lumpy on the bam file? How much does this typically influence the output? Could this be a part of the error message I am getting?

-M changes the way split reads are marked. It is not a deal breaker, but it can affect the results if you are using flags to identify split reads. Either way, that will not lead to a fault.

Secondly, do you recommend running gatk pre-processing, such as RealignerTargetCreator and IndelRealigner, before using lumpy? Does this influence results? If it does; have you checked how much and do you recommend for/against it?

I dont think running those steps will change the results, but I have not tested it to be sure.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/181#issuecomment-289721140, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUZNraDNSgTSNJY0hOUe90w0av6yGks5rqNklgaJpZM4MoDvp .

-- Ryan Layer

Sebastian-D commented 7 years ago

I am only running on one sample at a time for 17 samples. None of them went above 10 GB RAM, they each had 16 GB allocation, and checking the RAM usage statistics it dipped down about 10 - 20 minutes before the jobs failed.

Here is a plot from one of the jobs as an example, they all look much the same: image

ryanlayer commented 7 years ago

The inter-chrom file is .bam, and the sorting creates possibly hundreds of temp files. Do you see any of these?

On Mon, Apr 10, 2017 at 6:28 AM, Sebastian-D notifications@github.com wrote:

I am only running on one sample at a time for 17 samples. None of them went above 10 GB RAM, they each had 16 GB allocation, and checking the RAM usage statistics it dipped down about 10 - 20 minutes before the jobs failed.

Here is a plot from one of the jobs as an example, they all look much the same: [image: image] https://cloud.githubusercontent.com/assets/1298588/24861656/d527c0ce-1df9-11e7-8c79-2d17448860c1.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/181#issuecomment-292934906, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUf2e-vfgm9QCe316Ji6fs6VJ13bDks5ruiBxgaJpZM4MoDvp .

-- Ryan Layer

Sebastian-D commented 7 years ago

I don't see those files. Where are they written by default?

If it is in -T DIR temp directory [./output_prefix.XXXXXXXXXXXX], I dont see that folder?

If it is in /scratch it is long gone by now.

My command is generally like this:

lumpyexpress -B /samples/$1/BAM/$1.bam -v -o /output/lumpy.$1.vcf

ryanlayer commented 7 years ago

lumpyexpress will tell you the lumpy command it is running. What does that look like?

On Tue, Apr 11, 2017 at 2:11 AM, Sebastian-D notifications@github.com wrote:

I don't see those files. Where are they written by default?

If it is in -T DIR temp directory [./output_prefix.XXXXXXXXXXXX], I dont see that folder?

If it is in /scratch it is long gone by now.

My command is generally like this:

lumpyexpress -B /samples/$1/BAM/$1.bam -v -o /output/lumpy.$1.vcf

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/181#issuecomment-293183992, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUWO3vw14bzHNYuKqetwGxgMyviXrks5ruzXBgaJpZM4MoDvp .

-- Ryan Layer

Sebastian-D commented 7 years ago

From issues.lumpy.txt above:

/sw/apps/bioinfo/LUMPY/0.2.12/milou//bin/lumpy`  \
    -t lumpy.sample10.vcf.txqoX92PMMkk/lumpy.sample10.vcf \
    -msw 4 \
    -tt 0 \
     -pe bam_file:lumpy.sample10.vcf.txqoX92PMMkk/lumpy.sample10.vcf.sample1.discordants.bam,histo_file:lumpy.sample10.vcf.txqoX92PMMkk/lumpy.sample10.vcf.sample1.lib1.x4.histo,mean:368.884114533,stdev:98.3032655246,read_length:151,min_non_overlap:151,discordant_z:5,back_distance:10,weight:1,id:sample10,min_mapping_threshold:20,read_group:AH3FW7ALXX.sample10.1,read_group:AH3FW7ALXX.sample10.2,read_group:AH3FW7ALXX.sample10.3,read_group:AH3FW7ALXX.sample10.4,read_group:AH3FW7ALXX.sample10.5,read_group:AH3FW7ALXX.sample10.6,read_group:AH3FW7ALXX.sample10.7,read_group:AH3FW7ALXX.sample10.8,read_group:AH5L2YALXX.sample10.8 \
     -sr bam_file:lumpy.sample10.vcf.txqoX92PMMkk/lumpy.sample10.vcf.sample1.splitters.bam,back_distance:10,min_mapping_threshold:20,weight:1,id:sample10,min_clip:20,read_group:AH3FW7ALXX.sample10.1,read_group:AH3FW7ALXX.sample10.2,read_group:AH3FW7ALXX.sample10.3,read_group:AH3FW7ALXX.sample10.4,read_group:AH3FW7ALXX.sample10.5,read_group:AH3FW7ALXX.sample10.6,read_group:AH3FW7ALXX.sample10.7,read_group:AH3FW7ALXX.sample10.8,read_group:AH5L2YALXX.sample10.8 \
    > /output/lumpy.sample10.vcf
mej54 commented 6 years ago

Hi,

I'm wondering if this segmentation fault issue was ever resolved? I've run into the same error and have noticed a few issues posted with the same problem. I'm running Lumpy on multiple tumor-normal pairs that were BWA mem aligned to GRCh38 with decoy. It seems to have run through to completion on some of the samples, but the rest are running into the segmentation fault error.

I have tried increasing the memory ( up to 200G ) and using an exclusion regions bed file. Neither of these options have resolved the issue.

Thanks, Molly

ryanlayer commented 6 years ago

Are the samples that fail getting through chrm X and Y?

On Thu, May 31, 2018 at 9:35 AM, Molly Johnson notifications@github.com wrote:

Hi,

I'm wondering if this segmentation fault issue was ever resolved? I've run into the same error and have noticed a few issues posted with the same problem. I'm running Lumpy on multiple tumor-normal pairs that were BWA mem aligned to GRCh38 with decoy. It seems to have run through to completion on some of the samples, but the rest are running into the segmentation fault error.

I have tried increasing the memory ( up to 200G ) and using an exclusion regions bed file. Neither of these options have resolved the issue.

Thanks, Molly

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/181#issuecomment-393572166, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUQfModP9m_rpPw7rGKuaCMiftGfjks5t4A3egaJpZM4MoDvp .

-- Ryan Layer

mej54 commented 6 years ago

Hi Ryan,

According to the lumpy error file, it looks like it has made it past chr X and Y (screenshot attached). However, I'm worried about missing the interchromosomal events you mentioned previously in this issue. Is there a way to check for that? There seem to be a few translocations in the vcf, but not as many as I was expecting.

screen shot 2018-05-31 at 12 26 01 pm
ryanlayer commented 6 years ago

Ah! that screenshot was VERY helpful. I think the issue is all of those extra contigs. Can you try the exclude file from the link below? NOTE: you will need to decompress the file before giving it to lumpy.

http://layerlabweb.s3.amazonaws.com/lumpy/hg38_lcr_rand.bed.gz

On Thu, May 31, 2018 at 10:33 AM, Molly Johnson notifications@github.com wrote:

Hi Ryan,

According to the lumpy error file, it looks like it has made it past chr X and Y (screenshot attached). However, I'm worried about missing the interchromosomal events you mentioned previously in this issue. Is there a way to check for that? There seem to be a few translocations in the vcf, but not as many as I was expecting.

[image: screen shot 2018-05-31 at 12 26 01 pm] https://user-images.githubusercontent.com/10553651/40794856-66936646-64ce-11e8-9440-358ebb2c3a8d.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/181#issuecomment-393593699, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUaSgrCN4oeJavVbgi57FtenFjOydks5t4BtrgaJpZM4MoDvp .

-- Ryan Layer

mej54 commented 6 years ago

Hi Ryan,

I tried running Lumpy with the bed file you suggested, and still ran into the same segmentation fault error. I found that the bed file doesn't contain all of the extra contigs in the reference I'm using. However, I made a bed file listing the full lengths of all the extra contigs specific to the reference file I'm using, and that seems to work! Lumpy is now finishing without any errors.

Thank you for your help!

ryanlayer commented 6 years ago

Great! Can you share that file?

On Jun 4, 2018, at 7:34 AM, Molly Johnson notifications@github.com wrote:

Hi Ryan,

I tried running Lumpy with the bed file you suggested, and still ran into the same segmentation fault error. I found that the bed file doesn't contain all of the extra contigs in the reference I'm using. However, I made a bed file listing the full lengths of all the extra contigs specific to the reference file I'm using, and that seems to work! Lumpy is now finishing without any errors.

Thank you for your help!

Molly — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

mej54 commented 6 years ago

Yes, posting it here: GRCh38_contigs.bed.gz

ryanlayer commented 6 years ago

Thanks!

On Tue, Jun 5, 2018 at 11:52 AM, Molly Johnson notifications@github.com wrote:

Yes, posting it here: GRCh38_contigs.bed.gz https://github.com/arq5x/lumpy-sv/files/2073434/GRCh38_contigs.bed.gz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/181#issuecomment-394801436, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUdsjJpLJt5U0S-kauu2-I4akr0nbks5t5sVNgaJpZM4MoDvp .

-- Ryan Layer