broadinstitute / viral-pipelines

viral-ngs: complete pipelines
Other
51 stars 28 forks source link

[fastq_to_ubam] Cannot determine candidate qualities: no qualities found. #415

Closed shandu-m closed 2 years ago

shandu-m commented 2 years ago

(1) I downloaded the workflow files locally with

wget https://github.com/broadinstitute/viral-pipelines/archive/v2.1.0.2.tar.gz

(2) I ran fastq_to_ubam with the provided test data:

miniwdl run ../pipeline/viral-pipelines-2.1.0.2/pipes/WDL/workflows/fastq_to_ubam.wdl \
   FastqToUBAM.fastq_1=../pipeline/viral-pipelines-2.1.0.2/test/input/in1.fastq \
   FastqToUBAM.fastq_2=../pipeline/viral-pipelines-2.1.0.2/test/input/in2.fastq \
   FastqToUBAM.sample_name=in \
   FastqToUBAM.library_name=mylib

(3) I get the following stderr file, which seems to suggest that the qualities cannot be read in properly. The same error occurs when I use a different input fastq files stderr.txt

tomkinsc commented 2 years ago

Ultimately this is an error from Picard rather than viral-pipelines, but I'm curious about what may be different about your input files. Would it be possible for you to post them or an excerpt?

dpark01 commented 2 years ago

Well I think what's weird is that she's using the same input files from the repo's test/input/ directory that our CI test already does and passes regularly on GHActions on both miniwdl and Cromwell. I mean she's using an older release than usual, but I'm pretty sure all these tests passed back then too…

I mean this exact miniwdl invocation passed with the exact same input a few days ago in GHActions. It is almost as if the fastqs got corrupted after tarring or something.

shandu-m commented 2 years ago

The input files are the provided ones which look like this:

@myseq/1
TCAATAAAAAAAAAAAAGAAAGAAAAAAAAATTCTCCTCATTTTTGTTGT
+
""""""""""""""""""""""""""""""""""""""""""""""""""

and

@myseq/2
AATTATATTATTTCTTTGATAATTTCCTCTCCTCTTGTTTCTTTGTTTCT
+
"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#

I get the same error whether I run the script as in my post above, as well as when I run the script (from a newer version of viral-pipelines) like this:

miniwdl run https://raw.githubusercontent.com/broadinstitute/viral-pipelines/v2.1.8.0/pipes/WDL/workflows/fastq_to_ubam.wdl \
   FastqToUBAM.fastq_1=../pipeline/viral-pipelines-2.1.0.2/test/input/in1.fastq \
   FastqToUBAM.fastq_2=../pipeline/viral-pipelines-2.1.0.2/test/input/in2.fastq \
   FastqToUBAM.sample_name=in \
   FastqToUBAM.library_name=mylib

I get the same error using the fastq file attached here fq_18.fastq.zip

dpark01 commented 2 years ago

Wait, Shandu are you running on a Mac? Is your Docker using "gRPC FUSE" or "legacy osxfs" for file mounts? I wonder if this is the miniwdl/docker/mac bug where all the input files look empty to everything....

On Tue, May 3, 2022 at 10:15 AM shandukani @.***> wrote:

The input files are the provided ones which look like this:

@myseq/1 TCAATAAAAAAAAAAAAGAAAGAAAAAAAAATTCTCCTCATTTTTGTTGT + """"""""""""""""""""""""""""""""""""""""""""""""""

and

@myseq/2 AATTATATTATTTCTTTGATAATTTCCTCTCCTCTTGTTTCTTTGTTTCT + "#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#

I get the same error whether I run the script as in my post above, as well as when I run the script (from a newer version of viral-pipelines) like this:

miniwdl run https://raw.githubusercontent.com/broadinstitute/viral-pipelines/v2.1.8.0/pipes/WDL/workflows/fastq_to_ubam.wdl FastqToUBAM.fastq_1=../pipeline/viral-pipelines-2.1.0.2/test/input/in1.fastq \ FastqToUBAM.fastq_2=../pipeline/viral-pipelines-2.1.0.2/test/input/in2.fastq \ FastqToUBAM.sample_name=in \ FastqToUBAM.library_name=mylib

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/viral-pipelines/issues/415#issuecomment-1116151566, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA6RUWBOEFOFMALEXMNBQDVIEYJJANCNFSM5US3HYHQ . You are receiving this because you commented.Message ID: @.***>

-- Daniel J. Park, PhD (he/him/his) Sr Group Leader, Viral Computational Genomics Broad Institute of MIT and Harvard Cambridge, MA, USA Tel: +1-617-714-8526 @.*** https://www.broadinstitute.org/bios/daniel-park-phd

shandu-m commented 2 years ago

I am running on a Mac and appear to be using gRPC FUSE for file sharing. Should I be using legacy osxfs?

dpark01 commented 2 years ago

Yeah, try that (you'll need to restart your Docker environment)

https://github.com/chanzuckerberg/miniwdl/issues/145#issuecomment-733435644

shandu-m commented 2 years ago

Hmm, tried that but now I get a DockerException

docker.errors.DockerException: Error while fetching server API version: 500 Server Error for http+docker://localhost/version: Internal Server Error (\"b'dial unix /Users/shandu/Library/Containers/com.docker.docker/Data/docker.raw.sock: connect: connection refused'\")

Am I supposed to be explicitly running a docker container before trying to run the WDL file, or is that handled for me when I run the WDL?

tomkinsc commented 2 years ago

miniwdl will start (and stop) the docker containers as appropriate when working through workflow tasks. The socket refusing the connection makes me think the full docker engine process is not running. You may need to update/restart the docker engine and try again? As long as gRPC FUSE is disabled the file system mounting seems fairly reliable on macOS.

shandu-m commented 2 years ago

Restarted docker again, as well as updated from 4.7.0 to 4.7.1, and it works (so issue was file system mounting and/or version issue)! Thank you both for your help!