MichealVo commented 2 years ago

Error executing process > 'pipeline:split_bam (1)'

Caused by: Process pipeline:split_bam (1) terminated with an error exit status (1)

Command executed:

seqkit bam -j 4 -N 10 fastq_reads_aln_sorted.bam -o bam_bundles/ mv bam_bundles/ . for f in :*; do mv -v "$f" $(echo "$f" | tr ':' '-'); done

Command exit status: 1

Command output: (empty)

Command error: [INFO] Creating BAM bundles from file: fastq_reads_aln_sorted.bam [INFO] Minimum reads per bundle: 10 [INFO] Output directory: bam_bundles/ Bundle Chrom Start End NrRecs NrLoci [INFO] Written 0 BAM records to 0 loci and 1 bundles. mv: cannot stat 'bam_bundles/*': No such file or directory

nrhorner commented 2 years ago

Hi. I'm sorrry that you're having problems with the workflow.

Could you post the nextflow.log and the output of nextflow info epi2me-labs/wf-isoforms please?

MichealVo commented 2 years ago

Here is the information I have got after typing the command: ./nextflow info epi2me-labs/wf-isoforms

(base) ubuntu@kingspeak:~/wf-isoforms$ ./nextflow info epi2me-labs/wf-isoforms project name: epi2me-labs/wf-isoforms repository : https://github.com/epi2me-labs/wf-isoforms local path : /home/ubuntu/.nextflow/assets/epi2me-labs/wf-isoforms main script : main.nf description : RNA/cDNA isoform analysis workflow author : Oxford Nanopore Technologies revisions :

master (default) prerelease v0.1.0 [t] v0.1.1 [t]

MichealVo commented 2 years ago

I could not even generate the nextflow log file again because I have coped with a new error as seen below:

(base) ubuntu@kingspeak:~/wf-isoforms$ ./nextflow run . --fastq test_data/fastq --denovo --ref_genome test_data/SIRV_150601a.fasta -profile local --out_dir ${OUTPUT} -w ${OUTPUT}/workspace --sample sample_id -resume N E X T F L O W ~ version 21.10.6 Launching ./main.nf [awesome_roentgen] - revision: dc2e1a1ca5 Can't open cache DB: /home/ubuntu/wf-isoforms/.nextflow/cache/b46118db-3681-4553-8b88-e7bf9a1966e1/db

Nextflow needs to be executed in a shared file system that supports file locks. Alternatively you can run it in a local directory and specify the shared work directory by using by -w command line option.

nrhorner commented 2 years ago

Did you run it in the same location as the previous run? This is an issue with the file system not being able to create file locks

MichealVo commented 2 years ago

custom.log

Thank you for your advice. I have attached the nextflow log file here. It was weird that the error only occurred as I used my .fastq file generated from my MinION mini-computer.

MichealVo commented 2 years ago

The above error was fixed. I changed the reference genome file. But I got a new error as seen in the following log file:

executor > local (1) [84/b009fc] process > start_ping:pingMessage (1) [100%] 1 of 1, cached: 1 ✔ [4d/28c305] process > pipeline:summariseConcatReads (1) [100%] 1 of 1, cached: 1 ✔ [83/96ce6d] process > pipeline:getVersions [100%] 1 of 1, cached: 1 ✔ [0d/b078c4] process > pipeline:getParams [100%] 1 of 1, cached: 1 ✔ [48/418105] process > pipeline:preprocess_reads (1) [100%] 1 of 1, cached: 1 ✔ [cd/21bc52] process > pipeline:build_minimap_index [100%] 1 of 1, cached: 1 ✔ [49/9a502e] process > pipeline:reference_assembly:map_reads (1) [100%] 1 of 1, cached: 1 ✔ [82/8b83bf] process > pipeline:split_bam (1) [100%] 1 of 1, cached: 1 ✔ [d4/ded4d2] process > pipeline:assemble_transcripts (2) [100%] 6 of 6, cached: 6 ✔ [d6/1146df] process > pipeline:merge_gff_bundles (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > pipeline:run_gffcompare - [- ] process > pipeline:makeReport - [- ] process > pipeline:get_transcriptome - [e6/4ad703] process > output (1) [ 50%] 1 of 2, cached: 1 [64/6874c2] process > end_ping:pingMessage [100%] 1 of 1, cached: 1 ✔ Checking fastq input. Single directory input detected. Doing reference based transcript analysis WARN: Input tuple does not match input set cardinality declared by process pipeline:makeReport -- offending value: [[]] Error executing process > 'pipeline:merge_gff_bundles (1)'

Caused by: Process pipeline:merge_gff_bundles (1) terminated with an error exit status (1)

Command executed:

echo '##gff-version 2' >> transcripts_fastq.gff; echo '#pipeline-nanopore-isoforms: stringtie' >> transcripts_fastq.gff;

for fn in 000000005_ENSMUST00000044011.12-125-1654_bundle_fastq.gff 000000000_ENSMUST00000237752.2-1-2253_bundle_fastq.gff 000000004_ENSMUST00000152247.8-8143-9341_bundle_fastq.gff 000000002_ENSMUST00000152415.2-238-1102_bundle_fastq.gff 000000001_ENSMUST00000034848.14-0-891_bundle_fastq.gff 000000003_ENSMUST00000239436.2-1620-1897_bundle_fastq.gff; do grep -v '#' $fn >> transcripts_fastq.gff

done

Command exit status: 1

Command output: (empty)

Work dir: /home/ubuntu/wf-isoforms/work_space7/d6/1146dfc6a033685adce14d808890b2

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

custom7.log

MichealVo commented 2 years ago

Here is the information in the .gff files:

transcripts_fastq.gff :

gff-version 2

pipeline-nanopore-isoforms: stringtie

000000000_ENSMUST00000044011.12-125-1654_bundle_fastq.gff:

stringtie --rf -G Mus_musculus.GRCm39.105.gtf -L -v -A gene_abund.tab -p 4 --conservative -o 000000000_ENSMUST00000044011.12-125-1654_bundle_fastq.gff -l 0 000000000_ENSMUST00000044011.12-125-1654_bundle.bam StringTie version 2.1.1

nrhorner commented 2 years ago

nextflow -log custom7.log run wf-isoforms/ --fastq test_data/fastq --ref_genome test_data/Mus_musculus.GRCm39.cdna.all.fa --ref_annotation test_data/SIRV_isofroms.gtf --minimap2_opts '-uf --splice-flank=no' --out_dir outdir7 -w work_space7 -profile conda -resume

You need to ensure the reference annotation (--ref_annotation) file matches the genome assembly file. You will need a reference annotation for the mouse genome. Possibly this one: http://ftp.ensembl.org/pub/release-105/gtf/mus_musculus/Mus_musculus.GRCm39.105.gtf.gz. ?

MichealVo commented 2 years ago

Thank you so much for all your recommendations, and the error was solved correctly. I was almost successful in compiling all the code pipeline. However, there was one remaining error that might be the last one, as shown below. This error is pretty obvious that in some cases, we tried to assign a multi-index data frame to a column. Particularly, in this situation, I put 8 items to a column of one item. I am also sending the log file so that you can see this error in more detail. custom12.log

Traceback (most recent call last): File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-isoforms/bin/report.py", line 901, in main() File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-isoforms/bin/report.py", line 886, in main transcript_table(report, df_tmaps, args.transcript_table_cov_thresh) File "/home/ubuntu/.nextflow/assets/epi2me-labs/wf-isoforms/bin/report.py", line 667, in transcript_table df['parent gene iso num'] = df.apply( File "/home/ubuntu/wf-isoforms/work_space12/conda/epi2melabs-wf-isoforms-7d2573abb6ef506fbc61e109fcbeb065/lib/python3.8/site-packages/pandas/core/frame.py", line 3602, in setitem self._set_item_frame_value(key, value) File "/home/ubuntu/wf-isoforms/work_space12/conda/epi2melabs-wf-isoforms-7d2573abb6ef506fbc61e109fcbeb065/lib/python3.8/site-packages/pandas/core/frame.py", line 3742, in _set_item_frame_value self._set_item_mgr(key, arraylike) File "/home/ubuntu/wf-isoforms/work_space12/conda/epi2melabs-wf-isoforms-7d2573abb6ef506fbc61e109fcbeb065/lib/python3.8/site-packages/pandas/core/frame.py", line 3754, in _set_item_mgr self._mgr.insert(len(self._info_axis), key, value) File "/home/ubuntu/wf-isoforms/work_space12/conda/epi2melabs-wf-isoforms-7d2573abb6ef506fbc61e109fcbeb065/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1162, in insert block = new_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1)) File "/home/ubuntu/wf-isoforms/work_space12/conda/epi2melabs-wf-isoforms-7d2573abb6ef506fbc61e109fcbeb065/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 1937, in new_block check_ndim(values, placement, ndim) File "/home/ubuntu/wf-isoforms/work_space12/conda/epi2melabs-wf-isoforms-7d2573abb6ef506fbc61e109fcbeb065/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 1979, in check_ndim raise ValueError( ValueError: Wrong number of items passed 8, placement implies 1

nrhorner commented 2 years ago

Hi,

Sorry for the delay getting back to you. I'm not able to recreate your last issue at the moment. If the the data you are using is public domain, could you get it to me to test out including links to the reference genome and annotations used?

nrhorner commented 2 years ago

Also the following '-uf --splice-flank=no is only required for the SIRV test dataset.

nrhorner commented 2 years ago

Looking at this a bit more, I think the problem is due to there not being enough coverage amongst the transcripts and they get filtered out. I will get a fix out for this soon.

In the meantime, try setting the following transcript_table_cov_thresh = 1

nrhorner commented 2 years ago

Hi,

There's a new version of the workflow, if you'd like to try it out.

nextflow run -r v0.1.2 epi2me-labs/wf-isoforms

epi2me-labs / wf-isoforms

I got this error: mv: cannot stat 'bam_bundles/*': No such file or directory #8

gff-version 2

pipeline-nanopore-isoforms: stringtie