Open brigranger opened 4 years ago
Hi Brian, thanks for the report.
Do you know the coverage of this input bam?
I have a suspicion that this bam might have a low coverage, hence Sniffles cannot estimate error rate and other parameters from the few reads.
@SHuang-Broad I think you're right.
I wasn't able to find any specific coverage report for this bam, so I ran samtools idxstats
on it and it appears it has one read aligned on chr2, and that's it. So it's not terribly surprising it failed.
I guess it would be nice if the pipeline failed a little more gracefully and maybe was able to continue on? But there's really nothing here to do much with...
Wow, one read? Is this a test BAM file or real data?
@brigranger hmm.... Typically variants is the last step for the pipelines now, so if this one particular task fails, no downstream tasks are severely impacted practically speaking.
But I do think there could be more QC done at the beginning (or maybe middle) of the workflow, i.e. quit early when things are really suspicious.
What do you think, @kvg ?
This is an interesting one. Thinking ahead to amplicon sequencing, there will certainly be cases where we parallelize over chromosomes and some will have no data. So it looks like we'll have to figure out a way to protect ourselves from those kind of failures when the tools we're running don't have those protections built-in themselves.
How to do that robustly will require some thought. @brigranger can you send us a link to the original input BAM?
@kvg You should be able to find it here: gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/
I looked at the subreads bam, it seems to be a low yield CCS flowcell.
gsutil du -sh gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/m64020_200118_025318.subreads.bam
151.93 GiB gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/m64020_200118_025318.subreads.bam
So it looks like something in CCS that produced ultralow depth.
@kvg , yes, I believe this is ultimately tied to what QC we put in each step of the pipeline, as by nature has to be an implement-as-we-encounter issue, considering that there could be many places where things go wrong.
@brigranger is this the only occasion where you see the error? If so, I'm going to close this and deal with #113 instead as a solution.
I've only tried one other sample, but that seemed to get past this. Fine with me to switch priorities.
On Sun, May 17, 2020, 4:28 PM Steve Huang notifications@github.com wrote:
@brigranger https://github.com/brigranger is this the only occasion where you see the error? If so, I'm going to close this and deal with #113 https://github.com/broadinstitute/long-read-pipelines/issues/113 instead as a solution.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/long-read-pipelines/issues/111#issuecomment-629855935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIXJGTQ3CHFEZ6H4SJJC6LRSBCGLANCNFSM4MHBMOEQ .
In the PBCCSWholeGenomeSingleFlowcell workflow, in CallSVs, Sniffles task I got an error where it sounds like there's too few reads in the bam to estimate some parameter it needs? Log file excerpt follows:
This is using Terra with the method imported from dockstore: github.com/broadinstitute/long-read-pipelines/PBCCSWholeGenomeSingleFlowcellVersion: 2.0-dockstore-test-2
Let me know if any other details are needed.