Open mmterpstra opened 6 years ago
Thanks for providing some example reads. The failure you've encountered is in the first step of the SV caller, where it attempts to estimate the sequenced fragment length distribution. This is a particularly tricky operation for RNA.
While it is not possible to definitely diagnose the problem without the full data, but I can spot two issues which could be aggravating the operation:
1) More than half of the reads shown above are marked as PCR-duplicates. These are all filtered out from fragment length estimation and variant calling early on. Please check if the fraction of PCR-duplicate labeled reads in this file reflects your intention.
2) I think the hard-clipped reads are all being removed from consideration by manta upfront, and this might be overly conservative. Currently, in addition to other filtration criteria, manta will throw out any read when the CIGAR string contains an indel, more than one REFSKIP (N) segment, any hard clipping, and any soft-clip segment on the outside of the read pair. It seems reasonable that manta could accept hard-clip segment(s) to account for primer removal scenarios such as the above case if we carefully review this case -- I will raise this conversation with other RNA users.
Thanks this explains a lot:
more than one REFSKIP (N)
This would happen more often as reads get longer i think that original tophat0 and older alignment software had issues with this also reducing alignment rates ~10% when reads are 100 vs 75 bp. For this purpose this wouldn't be a problem though. Although some of the primers were designed to capture fusions.Your answer solves my problem. Maybe consider skipping this step when a PE dist is set on the cmdline of manta. Then you can use another tool to calculate this rseqc (best) / picard(more consistent with how manta it will see).
temp solution:
samtools view -F 1024 -h some.bam | perl -wlane 'if(not(m/^[@]/)&& defined($F[5])){$F[5] =~ s/\d*H//g;print join("\t",@F);}else{print $_;}' | samtools view -Sb > some.clean.bam
samtools index some.clean.bam
n=1 test and seems to work...(This is tested on anther version 1.1.0 though)
For a project I work with the single primer enrichment protocol for RNA-seq and this crashed manta. See below for 1. The error 2. Reads in the bam file as examples. The alignment is done with hisat2 and the TLEN was recalculated [picard] because this should be in reference genome space not transcript space according to the SAM spec. Other operations are calculations of MC tags [picard] and the Hard clipping of primers used in the enrichment [custom script].
Please see if you can find the problem / solution with the dataset / manta.
(PS I removed some things by filling in 'some' so the read group is actually empty)