Closed genec1 closed 3 years ago
Hi @genec1 — It looks like multiple jobs are failing which makes me think it's an issue with the inputs. May I see the contents of /home/ec2-user/manifest.tsv
and /home/ec2-user/toil_config.yaml
please? I see the tools are being run in paired mode — are you submitting single-end samples?
Below are the config files. In this case I was running a paired-end sample.
> cat ~/toil_config.yaml
output-dir: /data/work
star-index: file:///home/ec2-user/indices/starIndex_hg38_no_alt.tar.gz
rsem-ref: file:///home/ec2-user/indices/rsem_ref_hg38_no_alt.tar.gz
kallisto-index: file:///home/ec2-user/indices/kallisto_hg38.idx
hera-index:
max-sample-size: 100G
cutadapt: true
fwd-3pr-adapter: AGATCGGAAGAG
rev-3pr-adapter: AGATCGGAAGAG
fastqc: true
bamqc:
ssec:
gdc-token:
wiggle:
save-bam:
ci-test:
> cat ~/manifest.tsv
fq paired SRR1303078 file:///data/samples/SRR1303078_1.fastq,file:///data/samples/SRR1303078_2.fastq
Hi @genec1 — I can't see anything wrong with the config or manifest. I looked for the samples to see if i can replicate it, but it looks like I need privileges to download the files from SRA.
After looking through the logs, I don't believe this is actually a pipeline issue. Every step is failing on the samples, which indicates that there is some issue with them. For example, FASTQC:
Failed to process file R1.fastq
Kallisto:
[quant] running in paired-end mode [quant] will process pair 1: /data/R1.fastq /data/R2.fastq [quant] finding pseudoalignments for the reads ... done [quant] processed 0 reads, 0 reads pseudoaligned [~warn] no reads pseudoaligned.
You can test this out by downloading one of the tools, such as FASTQC, and testing it
docker run -v $(pwd):/data quay.io/ucsc_cgl/fastqc:0.11.5--be13567d00cd4c586edf8ae47d991815c8c72a49 /data/SRR1303078_1.fastq
Is there a chance the fastq files are actually gzipped but the extension is wrong? Can't really think of anything else.
Best, John
I've discovered that there are some non-ASCII characters in the fastq files being generated by fasterq-dump
. I believe that is the source of the problem. I'm now adding an additional step in the pipeline between fasterq-dump
and toil-rnaseq
to filter out these corrupted reads.
I'm getting further with getting toil-rnaseq to run, but it's still not going all the way. Now I seem to be dying with
OSError: [Errno 2] No such file or directory: '/data/work/toil-59fec6ca-1a65-4ae0-b9db-da5c9f650c22-f81cf66e29f975035d1375c55f43d968/tmpEhraLA/6369919f-68ec-4bfb-9d9e-5704975041b5/tMqc9eF/R1_fastqc.html'
Here is the full run:
If I look at the directory where the missing file should be, I see this: