arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
309 stars 118 forks source link

no result with multiple libraries #278

Open asylvz opened 5 years ago

asylvz commented 5 years ago

Hello, I'm running lumpyexpress with CHM13, which has 23 libraries. It seems to be running fine but no output is generated. It works fine with other bams.

lumpyexpress -B /mnt/storage1/projects/chm13/ilmn/CHM13.GRCh37.bam -o chm13_grc37_new_latest.vcf -v Sourcing executables from /mnt/compgen/inhouse/bin/lumpyexpress.config ...

Checking for required python modules (/usr/bin/python)...

create temporary directory

        /usr/bin/python /mnt/compgen/inhouse/src/lumpy/lumpy-sv//scripts/bamkit/bamgroupreads.py --fix_flags -i /mnt/storage1/projects/chm13/ilmn/CHM13.GRCh37.bam -r HGT55.2,HGT55.3,HGT55.4,HGT55.5,HGT55.6,HGT55.7,HGT55.8,HHYFM.1,HHYFM.2,HHYFM.3,HHYFM.4,HHYFM.5,HHYFM.6,HHYFM.7,HJ2HJ.1,HJ2HJ.2,HJ2HJ.3,HJ2HJ.4,HJ2HJ.5,HJ2HJ.6,HJ2HJ.7,HJ2HJ.8,HKK7N.7 \
            | /mnt/compgen/inhouse/bin/samblaster --acceptDupMarks --excludeDups --addMateTags --maxSplitCount 2 --minNonOverlap 20 \
            --splitterFile chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe --discordantFile chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe \   
            | /mnt/compgen/inhouse/bin/samtools view -S /dev/stdin \
            | gawk '{ if (NR<=1000000) print > "/dev/stdout" ; else print > "/dev/null" }' \
            | /usr/bin/python /mnt/compgen/inhouse/src/lumpy/lumpy-sv//scripts/pairend_distro.py -r 151 -X 4 -N 1000000 -o chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.x4.histo \
            > chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.insert.stats

        /mnt/compgen/inhouse/bin/samtools view -S -u chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe \
            | /mnt/compgen/inhouse/bin/samtools sort -m 1G -T  chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl -o chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.splitters.bam /dev/stdin
        /mnt/compgen/inhouse/bin/samtools view -S -u chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe \
            | /mnt/compgen/inhouse/bin/samtools sort -m 1G -T  chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc -o chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.discordants.bam /dev/stdin

samblaster: Version 0.1.24 samblaster: Inputting from stdin samblaster: Outputting to stdout samblaster: Opening chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe for write. samblaster: Opening chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe for write. samblaster: Loaded 85 header sequence entries. Warning: 6078801 unmatched name groups samblaster: Output 5214709 discordant read pairs to chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe samblaster: Output 0 split reads to chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe samblaster: Marked 0 of 426801819 (0.00%) read ids as duplicates using 2728k memory in 54M23S(3262.620S) CPU seconds and 5H1M29S(18089S) wall time. [bam_sort_core] merging from 4 files and 1 in-memory blocks... Removed 881 outliers with isize >= 1301

ryanlayer commented 5 years ago

I am guessing that there are no named read groups in the bam. Can you confirm?

On Nov 1, 2018, at 12:21 PM, Arda Soylev notifications@github.com wrote:

Hello, I'm running lumpyexpress with CHM13, which has 23 libraries. It seems to be running fine but no output is generated. It works fine with other bams.

lumpyexpress -B /mnt/storage1/projects/chm13/ilmn/CHM13.GRCh37.bam -o chm13_grc37_new_latest.vcf -v Sourcing executables from /mnt/compgen/inhouse/bin/lumpyexpress.config ...

Checking for required python modules (/usr/bin/python)...

create temporary directory

    /usr/bin/python /mnt/compgen/inhouse/src/lumpy/lumpy-sv//scripts/bamkit/bamgroupreads.py --fix_flags -i /mnt/storage1/projects/chm13/ilmn/CHM13.GRCh37.bam -r HGT55.2,HGT55.3,HGT55.4,HGT55.5,HGT55.6,HGT55.7,HGT55.8,HHYFM.1,HHYFM.2,HHYFM.3,HHYFM.4,HHYFM.5,HHYFM.6,HHYFM.7,HJ2HJ.1,HJ2HJ.2,HJ2HJ.3,HJ2HJ.4,HJ2HJ.5,HJ2HJ.6,HJ2HJ.7,HJ2HJ.8,HKK7N.7 \
        | /mnt/compgen/inhouse/bin/samblaster --acceptDupMarks --excludeDups --addMateTags --maxSplitCount 2 --minNonOverlap 20 \
        --splitterFile chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe --discordantFile chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe \   
        | /mnt/compgen/inhouse/bin/samtools view -S /dev/stdin \
        | gawk '{ if (NR<=1000000) print > "/dev/stdout" ; else print > "/dev/null" }' \
        | /usr/bin/python /mnt/compgen/inhouse/src/lumpy/lumpy-sv//scripts/pairend_distro.py -r 151 -X 4 -N 1000000 -o chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.x4.histo \
        > chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.insert.stats

    /mnt/compgen/inhouse/bin/samtools view -S -u chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe \
        | /mnt/compgen/inhouse/bin/samtools sort -m 1G -T  chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl -o chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.splitters.bam /dev/stdin
    /mnt/compgen/inhouse/bin/samtools view -S -u chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe \
        | /mnt/compgen/inhouse/bin/samtools sort -m 1G -T  chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc -o chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/chm13_grc37_new_latest.vcf.sample1.lib1.discordants.bam /dev/stdin

samblaster: Version 0.1.24 samblaster: Inputting from stdin samblaster: Outputting to stdout samblaster: Opening chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe for write. samblaster: Opening chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe for write. samblaster: Loaded 85 header sequence entries. Warning: 6078801 unmatched name groups samblaster: Output 5214709 discordant read pairs to chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/disc_pipe samblaster: Output 0 split reads to chm13_grc37_new_latest.vcf.1QkwI2BAqSiC/spl_pipe samblaster: Marked 0 of 426801819 (0.00%) read ids as duplicates using 2728k memory in 54M23S(3262.620S) CPU seconds and 5H1M29S(18089S) wall time. [bam_sort_core] merging from 4 files and 1 in-memory blocks... Removed 881 outliers with isize >= 1301

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

asylvz commented 5 years ago

I think they all have names as given below

@RG ID:HGT55.2 SM:CHM13 LB:Pond-482617 PL:illumina PU:HGT55CCXX160108.2.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HGT55.3 SM:CHM13 LB:Pond-482617 PL:illumina PU:HGT55CCXX160108.3.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HGT55.4 SM:CHM13 LB:Pond-482617 PL:illumina PU:HGT55CCXX160108.4.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HGT55.5 SM:CHM13 LB:Pond-482617 PL:illumina PU:HGT55CCXX160108.5.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HGT55.6 SM:CHM13 LB:Pond-482617 PL:illumina PU:HGT55CCXX160108.6.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HGT55.7 SM:CHM13 LB:Pond-482617 PL:illumina PU:HGT55CCXX160108.7.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HGT55.8 SM:CHM13 LB:Pond-482617 PL:illumina PU:HGT55CCXX160108.8.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HHYFM.1 SM:CHM13 LB:Pond-482617 PL:illumina PU:HHYFMCCXX160108.1.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HHYFM.2 SM:CHM13 LB:Pond-482617 PL:illumina PU:HHYFMCCXX160108.2.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HHYFM.3 SM:CHM13 LB:Pond-482617 PL:illumina PU:HHYFMCCXX160108.3.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HHYFM.4 SM:CHM13 LB:Pond-482617 PL:illumina PU:HHYFMCCXX160108.4.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HHYFM.5 SM:CHM13 LB:Pond-482617 PL:illumina PU:HHYFMCCXX160108.5.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HHYFM.6 SM:CHM13 LB:Pond-482617 PL:illumina PU:HHYFMCCXX160108.6.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HHYFM.7 SM:CHM13 LB:Pond-482617 PL:illumina PU:HHYFMCCXX160108.7.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.1 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.1.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.2 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.2.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.3 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.3.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.4 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.4.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.5 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.5.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.6 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.6.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.7 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.7.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HJ2HJ.8 SM:CHM13 LB:Pond-482617 PL:illumina PU:HJ2HJCCXX160108.8.AGGTCGCA CN:BI DT:2016-01-08T00:00:00-0500 PI:0 @RG ID:HKK7N.7 SM:CHM13 LB:Pond-482617 PL:illumina PU:HKK7NCCXX151222.7.AGGTCGCA CN:BI DT:2015-12-22T00:00:00-0500 PI:0