Illumina / manta

Structural variant and indel caller for mapped sequencing data
GNU General Public License v3.0
400 stars 153 forks source link

multi-sample genernate pipeline #165

Open huangl07 opened 5 years ago

huangl07 commented 5 years ago

dear Manta support:

I was used to analysis configManta.py --bam=SRR5739119.mkdup.bam --bam=SRR5739120.mkdup.bam --bam=SRR5739121.mkdup.bam --bam=SRR5739122.mkdup.bam --bam=SRR5739123.mkdup.bam --referenceFasta=../Genome/ref.fa --runDir=SV

and it turns out Error like this:


[2018-12-17T13:55:49.944212Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] Failed to complete command task: 'generateCandidateSV_0062' launched from master workflow, error code: 1, command: '/home/huangl/.env/manta/libexec/GenerateSVCandidates --align-stats /home/huangl/ncbi/BSA/SV/workspace/alignmentStats.xml --graph-file /home/huangl/ncbi/BSA/SV/workspace/svLocusGraph.bin --bin-index 62 --bin-count 256 --max-edge-count 10 --min-candidate-sv-size 8 --min-candidate-spanning-count 3 --min-scored-sv-size 50 --ref /home/huangl/ncbi/Genome/ref.fa --candidate-output-file /home/huangl/ncbi/BSA/SV/workspace/svHyGen/candidateSV.0062.vcf --diploid-output-file /home/huangl/ncbi/BSA/SV/workspace/svHyGen/diploidSV.0062.vcf --min-qual-score 10 --min-pass-qual-score 20 --min-pass-gt-score 15 --enable-remote-read-retrieval --chrom-depth /home/huangl/ncbi/BSA/SV/workspace/chromDepth.txt --edge-runtime-log /home/huangl/ncbi/BSA/SV/workspace/svHyGen/edgeRuntimeLog.0062.txt --edge-stats-log /home/huangl/ncbi/BSA/SV/workspace/svHyGen/edgeStats.0062.xml --align-file /home/huangl/ncbi/BSA/SRR5739119.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739120.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739121.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739122.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739123.mkdup.bam'
[2018-12-17T13:55:50.009872Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [generateCandidateSV_0062] Error Message:
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [generateCandidateSV_0062] Last 37 stderr lines from task (of 37 total lines):
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.649123Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] FATAL_ERROR: 2018-Dec-17 21:55:47 /home/huangl/biosoft/manta/src/c++/lib/applications/GenerateSVCandidates/SVScorePairAltProcessor.cpp(208): Throw in function bool SVScorePairAltProcessor::realignPairedRead(const bam_header_info&, const string&, bool, const string&, int, pos_t, int&)
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.718242Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] Dynamic exception type: boost::exception_detail::clone_impl<illumina::common::GeneralException>
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.759873Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] std::exception::what: Empty read attributed to sequence fragment with QNAME: 'SRR5739122.82403596' anchored by mate alignment starting at (1-indexed) position: 'NC_029263.1:19661908'
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.809892Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] [runGSC(GSCOptions const&, char const*, char const*)::current_edge_info*] = Exception caught while processing graph edge: edgeinfo locus:node1:node2: 0:42863:42863
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.859893Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062]    node1:LocusNode: GenomeInterval: 7:[19661613,19662345) n_edges: 10 out_count: 420 evidence: [19661506,19662422)
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.943715Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062]    EdgeTo: 24504 out_count: 9
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.985346Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062]    EdgeTo: 42848 out_count: 33
x-chen commented 5 years ago

Please check on the read highlighted in the error msg: _std::exception::what: Empty read attributed to sequence fragment with QNAME: 'SRR5739122.82403596' anchored by mate alignment starting at (1-indexed) position: 'NC029263.1:19661908'

BTW, is this ticket a duplicate of #164?

huangl07 commented 5 years ago

while? how could I generate the bam file.

I use the samtools sort and index it

and I use the picard to MarkDuplicate

should I use the sort bam file only?

huangl07 commented 5 years ago

hear is my mapping script bwa mem ../Genome/ref.fa SRR5739123_1.fastq.gz SRR5739123_2.fastq.gz -t 4 -a -M -R "@RG\tID:SRR5739123\tLG:SRR5739123\tLB:SRR5739123\tPL:illumina\tSM:SRR5739123\tPU:run_barcode\tCN:MajorBio\tDS:reseq" | samtools view -bS > SRR5739123.bam samtools sort SRR5739121.bam -o SRR5739121.sort.bam samtools markdup --reference ../Genome/ref.fa SRR5739121.sort.bam SRR5739121.mkdup.bam

huangl07 commented 5 years ago

well I think it's error maybe cause some low mapping quality reads or multimapping result

SRR5739119.25811182 401 NC_029256.1 1529566 0 125M NC_029259.1 5775201 0 * * NM:i:9 MD:Z:5G61C2A0C3A0A5G9G2A29 MC:Z:113M12AS:i:86 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 2009992 0 58H67M NC_029259.1 5775201 0 * * NM:i:1 MD:Z:61C5 MC:Z:113M12H AS:i:65 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 2198987 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10G42 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 2199076 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C7T45 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 2199165 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10G42 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.8064323 401 NC_029256.1 6533440 0 69H56M NC_029264.1 7870164 0 * * NM:i:3 MD:Z:39G3T9C2 MC:Z:125M AS:i:43 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 8236472 0 18M1D107M NC_029259.1 5775201 0 * * NM:i:10 MD:Z:9T8^A5T8C9C5T3T0G3A61C5 MC:Z:113M12H AS:i:79 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 8606257 0 125M NC_029259.1 5775201 0 * * NM:i:5 MD:Z:42T10G0T2A61C5 MC:Z:113M12H AS:i:106 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 8606435 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:42T10G3A61C5 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 10576656 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10A42 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 11886598 0 6H119M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:36T10G3A61C5 MC:Z:113M12AS:i:105 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 11886681 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:53G3A61C5 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 12723735 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:5G61T3C53 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 12723824 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C11C41 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.8064323 401 NC_029256.1 13943026 0 40H15M4I66M NC_029264.1 7870164 0 * * NM:i:8 MD:Z:18G45G0G2G12 MC:Z:125M AS:i:51 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603395 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C29A23 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603573 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:5G61T3C53 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603662 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:5G61T3C53 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603751 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10G42 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603840 0 125M NC_029259.1 5775201 0 * * NM:i:5 MD:Z:5G61T3C10A19A22 MC:Z:113M12AS:i:106 RG:Z:SRR5739119

huangl07 commented 5 years ago

is there any one could solve it?

x-chen commented 5 years ago

Among the reads you posted, I didn't see the one that has QNAME of 'SRR5739122.82403596' and is aligned to NC_029263.1 at 19661908. Pulling out that particular read will help understand why it triggered the error.

_std::exception::what: Empty read attributed to sequence fragment with QNAME: 'SRR5739122.82403596' anchored by mate alignment starting at (1-indexed) position: 'NC029263.1:19661908'

huangl07 commented 5 years ago

sorry bother that,but this reads also have error message

huangl07 commented 5 years ago

dear X-chen:

I have already reanalysis other datas with the script like: 1st step: bwa mem -M -a -t 8 -R "@RG\tID:1\tLG:A\tLB:1\tPL:illumina\tSM:A\tPU:run_barcode\tCN:MajorBio DS:reseq" /mnt/ilustre/centos7users/dna/SV/02.reference/ref.fa /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.1.fastq.gz /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.2.fastq.gz| samtools view -bS - > /mnt/ilustre/centos7users/dna/SV/03.mapping/A.b1.bam bwa mem -M -a -t 8 -R "@RG\tID:2\tLG:B\tLB:1\tPL:illumina\tSM:B\tPU:run_barcode\tCN:MajorBio DS:reseq" /mnt/ilustre/centos7users/dna/SV/02.reference/ref.fa /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171066:MJ20181118001:ECL171066:B.clean.1.fastq.gz /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171066:MJ20181118001:ECL171066:B.clean.2.fastq.gz| samtools view -bS - > /mnt/ilustre/centos7users/dna/SV/03.mapping/B.b1.bam

2st step:

samtools merge -f -p -@ 8 --output-fmt BAM /mnt/ilustre/centos7users/dna/SV/04.sort/B.merged.bam /mnt/ilustre/centos7users/dna/SV/03.mapping/B.b1.bam && samtools sort -o /mnt/ilustre/centos7users/dna/SV/04.sort/B.sort.bam --output-fmt BAM -@ 8 /mnt/ilustre/centos7users/dna/SV/04.sort/B.merged.bam &&samtools index /mnt/ilustre/centos7users/dna/SV/04.sort/B.sort.bam samtools merge -f -p -@ 8 --output-fmt BAM /mnt/ilustre/centos7users/dna/SV/04.sort/A.merged.bam /mnt/ilustre/centos7users/dna/SV/03.mapping/A.b1.bam && samtools sort -o /mnt/ilustre/centos7users/dna/SV/04.sort/A.sort.bam --output-fmt BAM -@ 8 /mnt/ilustre/centos7users/dna/SV/04.sort/A.merged.bam &&samtools index /mnt/ilustre/centos7users/dna/SV/04.sort/A.sort.bam and the configManta.py --bam=04.sort/A.sort.bam --bam=04.sort/B.sort.bam --runDir=04.SV --referenceFasta=02.reference/ref.fa running error is like the file workflow.error.log.txt

but the first error head bam is

A00184:284:H7K5GDSXX:1:2624:31132:5650 385 chr1 20274924 0 77M74H chr10 4299198 0 * NM:i:7 MD:Z:2T11G4C2A9G25A3T14 MC:Z:151M AS:i:44 RG:Z:1 A00184:284:H7K5GDSXX:1:2624:31132:5650 337 chr1 31656906 0 40H111M = 32586388 929373 NM:i:8 MD:Z:13C5T6T10A1C4A1C16T47 MC:Z:86M65H AS:i:71 RG:Z:1 A00184:284:H7K5GDSXX:1:2624:31132:5650 401 chr1 32586301 0 65H69M17H chr10 4299198 0 NM:i:5 MD:Z:8A14A3T25C9T5 MC:Z:151M AS:i:44 RG:Z:1 A00184:284:H7K5GDSXX:1:2624:31132:5650 129 chr1 32586388 22 86M65S chr10 4299198 0 ACGTTTGTTTGAATAGAGAGTGTGGCTGACATATGGGCCCGGGTGGCATTTGGGAATGTAAAATTGGGAGAGTGGCAGTTGAGCACGGGGATGTTGAGTGAGTGGCTCTTCGTCAGTTGTCCCTCTGAGAGAAGATAATCCTTCGAGGGAG ,F,:FFFF:FF,FFF:FFF,FF,::::,F,FFFF,:FFFF::F::FFF,:FFFFFF:F,F:F,:FFFFF:FF,FFF:F::FFF,F:,F,,,:FFFF,F,F,F:FF:,,,F,F,,:,FF,:,,,F::F,,,:F,F,,,:,,F:,:,,:,FFF NM:i:5 MD:Z:2T19A35A3T14T8 MC:Z:151M AS:i:63 XS:i:53 RG:Z:1

thankyou!

x-chen commented 5 years ago

I am not sure if the read you printed above is a single record or multiple records in the bam file? It looks the QNAME "A00184:284:H7K5GDSXX:1:2624:31132:5650" has 4 occurrences, and all being concatenated into a single line?

Looking the first occurrence (copied below), it has TLEN=0, SEQ=, and missing QUAL? _A00184:284:H7K5GDSXX:1:2624:31132:5650 385 chr1 20274924 0 77M74H chr10 4299198 0 NM:i:7 MD:Z:2T11G4C2A9G25A3T14 MC:Z:151M AS:i:44 RG:Z:1_ I think that's what Manta complained about.

Please make sure the input bam follows the spec https://samtools.github.io/hts-specs/SAMv1.pdf

huangl07 commented 5 years ago

e,the bam is generate by bwa mem -M -a -t 8 -R "@RG\tID:1\tLG:A\tLB:1\tPL:illumina\tSM:A\tPU:run_barcode\tCN:MajorBio DS:reseq" /mnt/ilustre/centos7users/dna/SV/02.reference/ref.fa /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.1.fastq.gz /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.2.fastq.gz| samtools view -bS - > /mnt/ilustre/centos7users/dna/SV/03.mapping/A.b1.bam samtools sort -o /mnt/ilustre/centos7users/dna/SV/04.sort/A.sort.bam --output-fmt BAM -@ 8 /mnt/ilustre/centos7users/dna/SV/04.sort/A.merged.bam

I didn't know how to fix it,could you please to figure this. cause GATK strekla could get the result

thank you!

huangl07 commented 5 years ago

the bam is multiple record

is it caused by the bwa -a parameter?

I will check it!

huangl07 commented 5 years ago

Hi chen, good news is I remapping the read to generate the bam file without bwa men -a parameter

but I can't understand why? could you show me some method to do after mapping down.

ricsethi commented 5 years ago

I face the same problem when I am making config file with --bam (single sample in my case) argument. But when I run the below mentioned command it works fine: configManta.py --tumorBam=sample.sort.bam --runDir=. --referenceFasta=ref.fa

@huangl07: Could you please comment on how to tackle this problem for diploid samples and how much different is the pipeline for diploid sample processing from tumor sample without normal sample.