multi-sample genernate pipeline

huangl07 commented 5 years ago

dear Manta support:

I was used to analysis configManta.py --bam=SRR5739119.mkdup.bam --bam=SRR5739120.mkdup.bam --bam=SRR5739121.mkdup.bam --bam=SRR5739122.mkdup.bam --bam=SRR5739123.mkdup.bam --referenceFasta=../Genome/ref.fa --runDir=SV

and it turns out Error like this:


[2018-12-17T13:55:49.944212Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] Failed to complete command task: 'generateCandidateSV_0062' launched from master workflow, error code: 1, command: '/home/huangl/.env/manta/libexec/GenerateSVCandidates --align-stats /home/huangl/ncbi/BSA/SV/workspace/alignmentStats.xml --graph-file /home/huangl/ncbi/BSA/SV/workspace/svLocusGraph.bin --bin-index 62 --bin-count 256 --max-edge-count 10 --min-candidate-sv-size 8 --min-candidate-spanning-count 3 --min-scored-sv-size 50 --ref /home/huangl/ncbi/Genome/ref.fa --candidate-output-file /home/huangl/ncbi/BSA/SV/workspace/svHyGen/candidateSV.0062.vcf --diploid-output-file /home/huangl/ncbi/BSA/SV/workspace/svHyGen/diploidSV.0062.vcf --min-qual-score 10 --min-pass-qual-score 20 --min-pass-gt-score 15 --enable-remote-read-retrieval --chrom-depth /home/huangl/ncbi/BSA/SV/workspace/chromDepth.txt --edge-runtime-log /home/huangl/ncbi/BSA/SV/workspace/svHyGen/edgeRuntimeLog.0062.txt --edge-stats-log /home/huangl/ncbi/BSA/SV/workspace/svHyGen/edgeStats.0062.xml --align-file /home/huangl/ncbi/BSA/SRR5739119.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739120.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739121.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739122.mkdup.bam --align-file /home/huangl/ncbi/BSA/SRR5739123.mkdup.bam'
[2018-12-17T13:55:50.009872Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [generateCandidateSV_0062] Error Message:
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [generateCandidateSV_0062] Last 37 stderr lines from task (of 37 total lines):
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.649123Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] FATAL_ERROR: 2018-Dec-17 21:55:47 /home/huangl/biosoft/manta/src/c++/lib/applications/GenerateSVCandidates/SVScorePairAltProcessor.cpp(208): Throw in function bool SVScorePairAltProcessor::realignPairedRead(const bam_header_info&, const string&, bool, const string&, int, pos_t, int&)
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.718242Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] Dynamic exception type: boost::exception_detail::clone_impl<illumina::common::GeneralException>
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.759873Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] std::exception::what: Empty read attributed to sequence fragment with QNAME: 'SRR5739122.82403596' anchored by mate alignment starting at (1-indexed) position: 'NC_029263.1:19661908'
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.809892Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062] [runGSC(GSCOptions const&, char const*, char const*)::current_edge_info*] = Exception caught while processing graph edge: edgeinfo locus:node1:node2: 0:42863:42863
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.859893Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062]    node1:LocusNode: GenomeInterval: 7:[19661613,19662345) n_edges: 10 out_count: 420 evidence: [19661506,19662422)
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.943715Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062]    EdgeTo: 24504 out_count: 9
[2018-12-17T13:55:50.068164Z] [localhost.localdomain] [28400_1] [TaskManager] [ERROR] [2018-12-17T13:55:47.985346Z] [localhost.localdomain] [28400_1] [generateCandidateSV_0062]    EdgeTo: 42848 out_count: 33

x-chen commented 5 years ago

Please check on the read highlighted in the error msg: _std::exception::what: Empty read attributed to sequence fragment with QNAME: 'SRR5739122.82403596' anchored by mate alignment starting at (1-indexed) position: 'NC029263.1:19661908'

BTW, is this ticket a duplicate of #164?

huangl07 commented 5 years ago

while？ how could I generate the bam file.

I use the samtools sort and index it

and I use the picard to MarkDuplicate

should I use the sort bam file only?

huangl07 commented 5 years ago

hear is my mapping script bwa mem ../Genome/ref.fa SRR5739123_1.fastq.gz SRR5739123_2.fastq.gz -t 4 -a -M -R "@RG\tID:SRR5739123\tLG:SRR5739123\tLB:SRR5739123\tPL:illumina\tSM:SRR5739123\tPU:run_barcode\tCN:MajorBio\tDS:reseq" | samtools view -bS > SRR5739123.bam samtools sort SRR5739121.bam -o SRR5739121.sort.bam samtools markdup --reference ../Genome/ref.fa SRR5739121.sort.bam SRR5739121.mkdup.bam

huangl07 commented 5 years ago

well I think it's error maybe cause some low mapping quality reads or multimapping result

SRR5739119.25811182 401 NC_029256.1 1529566 0 125M NC_029259.1 5775201 0 * * NM:i:9 MD:Z:5G61C2A0C3A0A5G9G2A29 MC:Z:113M12AS:i:86 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 2009992 0 58H67M NC_029259.1 5775201 0 * * NM:i:1 MD:Z:61C5 MC:Z:113M12H AS:i:65 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 2198987 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10G42 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 2199076 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C7T45 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 2199165 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10G42 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.8064323 401 NC_029256.1 6533440 0 69H56M NC_029264.1 7870164 0 * * NM:i:3 MD:Z:39G3T9C2 MC:Z:125M AS:i:43 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 8236472 0 18M1D107M NC_029259.1 5775201 0 * * NM:i:10 MD:Z:9T8^A5T8C9C5T3T0G3A61C5 MC:Z:113M12H AS:i:79 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 8606257 0 125M NC_029259.1 5775201 0 * * NM:i:5 MD:Z:42T10G0T2A61C5 MC:Z:113M12H AS:i:106 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 8606435 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:42T10G3A61C5 MC:Z:113M12H AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 10576656 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10A42 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 11886598 0 6H119M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:36T10G3A61C5 MC:Z:113M12AS:i:105 RG:Z:SRR5739119 SRR5739119.25811182 385 NC_029256.1 11886681 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:53G3A61C5 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 12723735 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:5G61T3C53 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 12723824 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C11C41 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.8064323 401 NC_029256.1 13943026 0 40H15M4I66M NC_029264.1 7870164 0 * * NM:i:8 MD:Z:18G45G0G2G12 MC:Z:125M AS:i:51 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603395 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C29A23 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603573 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:5G61T3C53 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603662 0 125M NC_029259.1 5775201 0 * * NM:i:3 MD:Z:5G61T3C53 MC:Z:113M12H AS:i:116 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603751 0 125M NC_029259.1 5775201 0 * * NM:i:4 MD:Z:5G61T3C10G42 MC:Z:113M12AS:i:111 RG:Z:SRR5739119 SRR5739119.25811182 401 NC_029256.1 15603840 0 125M NC_029259.1 5775201 0 * * NM:i:5 MD:Z:5G61T3C10A19A22 MC:Z:113M12AS:i:106 RG:Z:SRR5739119

huangl07 commented 5 years ago

is there any one could solve it？

x-chen commented 5 years ago

Among the reads you posted, I didn't see the one that has QNAME of 'SRR5739122.82403596' and is aligned to NC_029263.1 at 19661908. Pulling out that particular read will help understand why it triggered the error.

_std::exception::what: Empty read attributed to sequence fragment with QNAME: 'SRR5739122.82403596' anchored by mate alignment starting at (1-indexed) position: 'NC029263.1:19661908'

huangl07 commented 5 years ago

sorry bother that，but this reads also have error message

huangl07 commented 5 years ago

dear X-chen:

I have already reanalysis other datas with the script like: 1st step: bwa mem -M -a -t 8 -R "@RG\tID:1\tLG:A\tLB:1\tPL:illumina\tSM:A\tPU:run_barcode\tCN:MajorBio DS:reseq" /mnt/ilustre/centos7users/dna/SV/02.reference/ref.fa /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.1.fastq.gz /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.2.fastq.gz| samtools view -bS - > /mnt/ilustre/centos7users/dna/SV/03.mapping/A.b1.bam bwa mem -M -a -t 8 -R "@RG\tID:2\tLG:B\tLB:1\tPL:illumina\tSM:B\tPU:run_barcode\tCN:MajorBio DS:reseq" /mnt/ilustre/centos7users/dna/SV/02.reference/ref.fa /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171066:MJ20181118001:ECL171066:B.clean.1.fastq.gz /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171066:MJ20181118001:ECL171066:B.clean.2.fastq.gz| samtools view -bS - > /mnt/ilustre/centos7users/dna/SV/03.mapping/B.b1.bam

2st step:

samtools merge -f -p -@ 8 --output-fmt BAM /mnt/ilustre/centos7users/dna/SV/04.sort/B.merged.bam /mnt/ilustre/centos7users/dna/SV/03.mapping/B.b1.bam && samtools sort -o /mnt/ilustre/centos7users/dna/SV/04.sort/B.sort.bam --output-fmt BAM -@ 8 /mnt/ilustre/centos7users/dna/SV/04.sort/B.merged.bam &&samtools index /mnt/ilustre/centos7users/dna/SV/04.sort/B.sort.bam samtools merge -f -p -@ 8 --output-fmt BAM /mnt/ilustre/centos7users/dna/SV/04.sort/A.merged.bam /mnt/ilustre/centos7users/dna/SV/03.mapping/A.b1.bam && samtools sort -o /mnt/ilustre/centos7users/dna/SV/04.sort/A.sort.bam --output-fmt BAM -@ 8 /mnt/ilustre/centos7users/dna/SV/04.sort/A.merged.bam &&samtools index /mnt/ilustre/centos7users/dna/SV/04.sort/A.sort.bam and the configManta.py --bam=04.sort/A.sort.bam --bam=04.sort/B.sort.bam --runDir=04.SV --referenceFasta=02.reference/ref.fa running error is like the file workflow.error.log.txt

but the first error head bam is

A00184:284:H7K5GDSXX:1:2624:31132:5650 385 chr1 20274924 0 77M74H chr10 4299198 0 * NM:i:7 MD:Z:2T11G4C2A9G25A3T14 MC:Z:151M AS:i:44 RG:Z:1 A00184:284:H7K5GDSXX:1:2624:31132:5650 337 chr1 31656906 0 40H111M = 32586388 929373 NM:i:8 MD:Z:13C5T6T10A1C4A1C16T47 MC:Z:86M65H AS:i:71 RG:Z:1 A00184:284:H7K5GDSXX:1:2624:31132:5650 401 chr1 32586301 0 65H69M17H chr10 4299198 0 NM:i:5 MD:Z:8A14A3T25C9T5 MC:Z:151M AS:i:44 RG:Z:1 A00184:284:H7K5GDSXX:1:2624:31132:5650 129 chr1 32586388 22 86M65S chr10 4299198 0 ACGTTTGTTTGAATAGAGAGTGTGGCTGACATATGGGCCCGGGTGGCATTTGGGAATGTAAAATTGGGAGAGTGGCAGTTGAGCACGGGGATGTTGAGTGAGTGGCTCTTCGTCAGTTGTCCCTCTGAGAGAAGATAATCCTTCGAGGGAG ,F,:FFFF:FF,FFF:FFF,FF,::::,F,FFFF,:FFFF::F::FFF,:FFFFFF:F,F:F,:FFFFF:FF,FFF:F::FFF,F:,F,,,:FFFF,F,F,F:FF:,,,F,F,,:,FF,:,,,F::F,,,:F,F,,,:,,F:,:,,:,FFF NM:i:5 MD:Z:2T19A35A3T14T8 MC:Z:151M AS:i:63 XS:i:53 RG:Z:1

thankyou!

x-chen commented 5 years ago

I am not sure if the read you printed above is a single record or multiple records in the bam file? It looks the QNAME "A00184:284:H7K5GDSXX:1:2624:31132:5650" has 4 occurrences, and all being concatenated into a single line?

Looking the first occurrence (copied below), it has TLEN=0, SEQ=, and missing QUAL? _A00184:284:H7K5GDSXX:1:2624:31132:5650 385 chr1 20274924 0 77M74H chr10 4299198 0 NM:i:7 MD:Z:2T11G4C2A9G25A3T14 MC:Z:151M AS:i:44 RG:Z:1_ I think that's what Manta complained about.

Please make sure the input bam follows the spec https://samtools.github.io/hts-specs/SAMv1.pdf

huangl07 commented 5 years ago

e，the bam is generate by bwa mem -M -a -t 8 -R "@RG\tID:1\tLG:A\tLB:1\tPL:illumina\tSM:A\tPU:run_barcode\tCN:MajorBio DS:reseq" /mnt/ilustre/centos7users/dna/SV/02.reference/ref.fa /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.1.fastq.gz /mnt/ilustre/centos7users/dna/SV/data/B1228nova5:L1ECL171065:MJ20181118001:ECL171065:A.clean.2.fastq.gz| samtools view -bS - > /mnt/ilustre/centos7users/dna/SV/03.mapping/A.b1.bam samtools sort -o /mnt/ilustre/centos7users/dna/SV/04.sort/A.sort.bam --output-fmt BAM -@ 8 /mnt/ilustre/centos7users/dna/SV/04.sort/A.merged.bam

I didn't know how to fix it，could you please to figure this. cause GATK strekla could get the result

thank you!

huangl07 commented 5 years ago

the bam is multiple record

is it caused by the bwa -a parameter?

I will check it!

huangl07 commented 5 years ago

Hi chen, good news is I remapping the read to generate the bam file without bwa men -a parameter

but I can't understand why? could you show me some method to do after mapping down.

ricsethi commented 5 years ago

I face the same problem when I am making config file with --bam (single sample in my case) argument. But when I run the below mentioned command it works fine: configManta.py --tumorBam=sample.sort.bam --runDir=. --referenceFasta=ref.fa

@huangl07: Could you please comment on how to tackle this problem for diploid samples and how much different is the pipeline for diploid sample processing from tumor sample without normal sample.

Illumina / manta

multi-sample genernate pipeline #165