Open MichealVo opened 2 years ago
Hi. I'm sorrry that you're having problems with the workflow.
Could you post the nextflow.log and the output of nextflow info epi2me-labs/wf-isoforms
please?
Here is the information I have got after typing the command: ./nextflow info epi2me-labs/wf-isoforms
(base) ubuntu@kingspeak:~/wf-isoforms$ ./nextflow info epi2me-labs/wf-isoforms project name: epi2me-labs/wf-isoforms repository : https://github.com/epi2me-labs/wf-isoforms local path : /home/ubuntu/.nextflow/assets/epi2me-labs/wf-isoforms main script : main.nf description : RNA/cDNA isoform analysis workflow author : Oxford Nanopore Technologies revisions :
I could not even generate the nextflow log file again because I have coped with a new error as seen below:
(base) ubuntu@kingspeak:~/wf-isoforms$ ./nextflow run . --fastq test_data/fastq --denovo --ref_genome test_data/SIRV_150601a.fasta -profile local --out_dir ${OUTPUT} -w ${OUTPUT}/workspace --sample sample_id -resume
N E X T F L O W ~ version 21.10.6
Launching ./main.nf
[awesome_roentgen] - revision: dc2e1a1ca5
Can't open cache DB: /home/ubuntu/wf-isoforms/.nextflow/cache/b46118db-3681-4553-8b88-e7bf9a1966e1/db
Nextflow needs to be executed in a shared file system that supports file locks.
Alternatively you can run it in a local directory and specify the shared work
directory by using by -w
command line option.
Did you run it in the same location as the previous run? This is an issue with the file system not being able to create file locks
Thank you for your advice. I have attached the nextflow log file here. It was weird that the error only occurred as I used my .fastq file generated from my MinION mini-computer.
The above error was fixed. I changed the reference genome file. But I got a new error as seen in the following log file:
executor > local (1)
[84/b009fc] process > start_ping:pingMessage (1) [100%] 1 of 1, cached: 1 ✔
[4d/28c305] process > pipeline:summariseConcatReads (1) [100%] 1 of 1, cached: 1 ✔
[83/96ce6d] process > pipeline:getVersions [100%] 1 of 1, cached: 1 ✔
[0d/b078c4] process > pipeline:getParams [100%] 1 of 1, cached: 1 ✔
[48/418105] process > pipeline:preprocess_reads (1) [100%] 1 of 1, cached: 1 ✔
[cd/21bc52] process > pipeline:build_minimap_index [100%] 1 of 1, cached: 1 ✔
[49/9a502e] process > pipeline:reference_assembly:map_reads (1) [100%] 1 of 1, cached: 1 ✔
[82/8b83bf] process > pipeline:split_bam (1) [100%] 1 of 1, cached: 1 ✔
[d4/ded4d2] process > pipeline:assemble_transcripts (2) [100%] 6 of 6, cached: 6 ✔
[d6/1146df] process > pipeline:merge_gff_bundles (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > pipeline:run_gffcompare -
[- ] process > pipeline:makeReport -
[- ] process > pipeline:get_transcriptome -
[e6/4ad703] process > output (1) [ 50%] 1 of 2, cached: 1
[64/6874c2] process > end_ping:pingMessage [100%] 1 of 1, cached: 1 ✔
Checking fastq input.
Single directory input detected.
Doing reference based transcript analysis
WARN: Input tuple does not match input set cardinality declared by process pipeline:makeReport
-- offending value: [[]]
Error executing process > 'pipeline:merge_gff_bundles (1)'
Caused by:
Process pipeline:merge_gff_bundles (1)
terminated with an error exit status (1)
Command executed:
echo '##gff-version 2' >> transcripts_fastq.gff; echo '#pipeline-nanopore-isoforms: stringtie' >> transcripts_fastq.gff;
for fn in 000000005_ENSMUST00000044011.12-125-1654_bundle_fastq.gff 000000000_ENSMUST00000237752.2-1-2253_bundle_fastq.gff 000000004_ENSMUST00000152247.8-8143-9341_bundle_fastq.gff 000000002_ENSMUST00000152415.2-238-1102_bundle_fastq.gff 000000001_ENSMUST00000034848.14-0-891_bundle_fastq.gff 000000003_ENSMUST00000239436.2-1620-1897_bundle_fastq.gff; do grep -v '#' $fn >> transcripts_fastq.gff
done
Command exit status: 1
Command output: (empty)
Work dir: /home/ubuntu/wf-isoforms/work_space7/d6/1146dfc6a033685adce14d808890b2
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
Here is the information in the .gff files:
transcripts_fastq.gff :
000000000_ENSMUST00000044011.12-125-1654_bundle_fastq.gff:
stringtie --rf -G Mus_musculus.GRCm39.105.gtf -L -v -A gene_abund.tab -p 4 --conservative -o 000000000_ENSMUST00000044011.12-125-1654_bundle_fastq.gff -l 0 000000000_ENSMUST00000044011.12-125-1654_bundle.bam StringTie version 2.1.1
nextflow -log custom7.log run wf-isoforms/ --fastq test_data/fastq --ref_genome test_data/Mus_musculus.GRCm39.cdna.all.fa --ref_annotation test_data/SIRV_isofroms.gtf --minimap2_opts '-uf --splice-flank=no' --out_dir outdir7 -w work_space7 -profile conda -resume
You need to ensure the reference annotation (--ref_annotation) file matches the genome assembly file. You will need a reference annotation for the mouse genome. Possibly this one: http://ftp.ensembl.org/pub/release-105/gtf/mus_musculus/Mus_musculus.GRCm39.105.gtf.gz. ?
Thank you so much for all your recommendations, and the error was solved correctly. I was almost successful in compiling all the code pipeline. However, there was one remaining error that might be the last one, as shown below. This error is pretty obvious that in some cases, we tried to assign a multi-index data frame to a column. Particularly, in this situation, I put 8 items to a column of one item. I am also sending the log file so that you can see this error in more detail. custom12.log
Traceback (most recent call last):
File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-isoforms/bin/report.py", line 901, in
Hi,
Sorry for the delay getting back to you. I'm not able to recreate your last issue at the moment. If the the data you are using is public domain, could you get it to me to test out including links to the reference genome and annotations used?
Also the following '-uf --splice-flank=no
is only required for the SIRV test dataset.
Looking at this a bit more, I think the problem is due to there not being enough coverage amongst the transcripts and they get filtered out. I will get a fix out for this soon.
In the meantime, try setting the following transcript_table_cov_thresh = 1
Hi,
There's a new version of the workflow, if you'd like to try it out.
nextflow run -r v0.1.2 epi2me-labs/wf-isoforms
Error executing process > 'pipeline:split_bam (1)'
Caused by: Process
pipeline:split_bam (1)
terminated with an error exit status (1)Command executed:
seqkit bam -j 4 -N 10 fastq_reads_aln_sorted.bam -o bam_bundles/ mv bam_bundles/ . for f in :*; do mv -v "$f" $(echo "$f" | tr ':' '-'); done
Command exit status: 1
Command output: (empty)
Command error: [INFO] Creating BAM bundles from file: fastq_reads_aln_sorted.bam [INFO] Minimum reads per bundle: 10 [INFO] Output directory: bam_bundles/ Bundle Chrom Start End NrRecs NrLoci [INFO] Written 0 BAM records to 0 loci and 1 bundles. mv: cannot stat 'bam_bundles/*': No such file or directory