Closed Johnsonzcode closed 1 year ago
Dear @Johnsonzcode,
many thanks for your comments under my issue, I am wondering what's the command you used here?
Best, CW
Getting the same error while running the nextflow pipeline at the CLI on AWS cloud9 environment (Amazon Linux 2) Looks like a python issue of escaping backslashes.
Caused by:
Process pipeline:denovo_assembly:clustering (1)
terminated with an error exit status (2)
Command executed:
workflow-glue run_isonclust2 batches
Command exit status: 2
Command output: (empty)
Command error: /home/epi2melabs/conda/lib/python3.8/site-packages/gffutils/parser.py:19: DeprecationWarning: invalid escape sequence \w gff3_kw_pat = re.compile('\w+=') usage: wf-glue [-h] [--debug | --quiet] [-v] {check_sample_sheet,compute_cluster_quality,generate_pychopper_stats,generate_tracking_summary,merge_count_tsvs,report,run_isonclust2} ... wf-glue: error: unrecognized arguments: batches
Thank you all for reporting this error. We'll pick it up as soon as we can. I'd like to first understand why this wasn't picked up in our automated testing 🤔
Hi, This should be fixed in the latest version v0.1.13. Thanks again for reporting.
Dear @sarahjeeeze,
many thanks for the update, I have tried to run v0.1.13, I am getting a new error: it stuck at the make_batches stage, Error executing process > 'output (1)'
Caused by:
Cannot run program "/bin/bash" (in directory "PromethION_RNA/epi2me_denovo/w
orkspace/e7/0a59ec4cbf2ced583fbdb25723030d"): error=2, No such file or directory
I checked the directory does exist, and the slurm error is:
[WARN] Task failed
java.io.IOException: Cannot run program "sh": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
at java.base/java.lang.Runtime.exec(Runtime.java:594)
at java.base/java.lang.Runtime.exec(Runtime.java:453)
at jline.internal.TerminalLineSettings.exec(TerminalLineSettings.java:183)
at jline.internal.TerminalLineSettings.exec(TerminalLineSettings.java:173)
at jline.internal.TerminalLineSettings.stty(TerminalLineSettings.java:168)
at jline.internal.TerminalLineSettings.set(TerminalLineSettings.java:76)
at jline.internal.TerminalLineSettings.restore(TerminalLineSettings.java:68)
at jline.UnixTerminal.restore(UnixTerminal.java:65)
at jline.TerminalSupport$1.run(TerminalSupport.java:49)
at jline.internal.ShutdownHooks.runTasks(ShutdownHooks.java:66)
at jline.internal.ShutdownHooks.access$000(ShutdownHooks.java:22)
at jline.internal.ShutdownHooks$1.run(ShutdownHooks.java:47)
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.
Could you please help me on this?
Thank you very much!
Best, CW
Hi @sarahjeeeze I am replicating the same "make batches" error as @CWYuan08 on an internal dataset processing it on AWS Cloud9 (Amazon Linux 2), profile=docker.
Nextflow log Error executing process > 'pipeline:denovo_assembly:clustering (1)'
Caused by:
Process pipeline:denovo_assembly:clustering (1)
terminated with an error exit status (2)
Command executed:
workflow-glue run_isonclust2 batches
Command exit status: 2
Command output: (empty)
Command error: /home/epi2melabs/conda/lib/python3.8/site-packages/gffutils/parser.py:19: DeprecationWarning: invalid escape sequence \w gff3_kw_pat = re.compile('\w+=') usage: wf-glue [-h] [--debug | --quiet] [-v] {check_sample_sheet,compute_cluster_quality,generate_pychopper_stats,generate_tracking_summary,merge_count_tsvs,report,run_isonclust2} ... wf-glue: error: unrecognized arguments: batches
Command run
workflow-glue run_isonclust2 batches
Command err /home/epi2melabs/conda/lib/python3.8/site-packages/gffutils/parser.py:19: DeprecationWarning: invalid escape sequence \w gff3_kw_pat = re.compile('\w+=') usage: wf-glue [-h] [--debug | --quiet] [-v] {check_sample_sheet,compute_cluster_quality,generate_pychopper_stats,generate_tracking_summary,merge_count_tsvs,report,run_isonclust2} ... wf-glue: error: unrecognized arguments: batches
ETA: The pipeline runs for close to ~2-3hrs at the make batches step prior to crashing
Hi @Caffeinated-Code which version of the workflow are you using? v0.1.13? - that error looks like one from an older version.
Hi @CWYuan08, I am not sure exactly what is causing that java error but looks like its related to your environment, if you are running it on a device does it have enough space on it? If not perhaps set up a mount point for the work directory if possible, and clean up any unused work directories with nextflow clean
Hi Sarah, I have revised it to the recent v0.1.13 and getting another workflow glue error.
Core Nextflow options revision : v0.1.13 runName : cranky_lumiere containerEngine : docker launchDir : /home/ec2-user/environment/scripts workDir : /data/workdir projectDir : /home/ec2-user/.nextflow/assets/epi2me-labs/wf-transcriptomes userName : ec2-user profile : standard configFiles : /home/ec2-user/.nextflow/config, /home/ec2-user/.nextflow/assets/epi2me-labs/wf-transcriptomes/nextflow.config
Input Options fastq : /data/Downstream_Analaysis/xxx/RawData transcriptome_source : denovo ref_genome : /home/ec2-user/environment/annotations/minimap2/hg38as.fa ref_transcriptome : null
Output Options out_dir : /data/TEST_wf_transcriptomes.DeNovo.xxx.202304
Options for reference-based workflow plot_gffcmp_stats : true minimap2_opts : -y -p 0.99 -k14 --MD -ax splice
Options for de novo-based workflow isOnClust2_sort_options: --batch-size -1 --kmer-size 11 --window-size 15 --min-shared 5 --min-qual 7.0 --mapped-threshold 0.65 --aligned-threshold 0.2 --min-fraction 0.8 --min-prob-no-hits 0.0 -M -1 -P 500 -g 50 -c 150 -F 2
Differential Expression Options condition_sheet : test_data/condition_sheet.tsv
Advanced Options threads : 4 pychopper_opts : -m phmm -k PCS111 -U -y bundle_min_reads : 50000
executor > local (1362) [97/f5cb92] process > fastcat (1) [100%] 1 of 1 ✔ [c4/9f9a16] process > pipeline:preprocess_ref_annotation [100%] 1 of 1 ✔ [95/50cd41] process > pipeline:collectFastqIngressResultsInDir (1) [100%] 1 of 1 ✔ [53/d79776] process > pipeline:getVersions [100%] 1 of 1 ✔ [64/3cd7ad] process > pipeline:getParams [100%] 1 of 1 ✔ [5f/0734f3] process > pipeline:preprocess_reads (1) [100%] 1 of 1 ✔ [c3/1dfad6] process > pipeline:denovo_assembly:make_batches (1) [100%] 1 of 1 ✔ [90/bb8e68] process > pipeline:denovo_assembly:clustering (1) [100%] 1 of 1 ✔ [54/fcf0e8] process > pipeline:denovo_assembly:dump_clusters (1) [100%] 1 of 1 ✔ [b0/b427e4] process > pipeline:denovo_assembly:build_backbones (1256) [ 3%] 1347 of 41501 [- ] process > pipeline:denovo_assembly:merge_cds - [- ] process > pipeline:denovo_assembly:cds_align - [7e/b0bf1c] process > pipeline:denovo_assembly:cluster_quality (1) [ 0%] 0 of 1 [- ] process > pipeline:split_bam - [- ] process > pipeline:assemble_transcripts - [- ] process > pipeline:merge_gff_bundles - [- ] process > pipeline:run_gffcompare - [- ] process > pipeline:get_transcriptome - [- ] process > pipeline:makeReport - [bd/77c0b3] process > output (1) [100%] 1 of 1 Error executing process > 'pipeline:denovo_assembly:cluster_quality (1)'
Caused by:
Process pipeline:denovo_assembly:cluster_quality (1)
terminated with an error exit status (1)
Command executed:
mkdir RawData_cluster_qc mkdir RawData_cluster_qc_raw minimap2 -ax splice -t 2 hg38as.fa RawData_full_length_reads.fastq | samtools view -q 2 -F 2304 -b - | samtools sort - -o RawData_cluster_qc/ref_aln.bam; samtools index RawData_cluster_qc/ref_aln.bam; workflow-glue compute_cluster_quality --sizes final_clusters/clusters_info.tsv --outfile RawData_cluster_qc/cluster_quality.csv --ont --clusters final_clusters/clusters.tsv --classes RawData_cluster_qc/ref_aln.bam --report RawData_cluster_qc/cluster_quality.pdf --raw_data_out RawData_cluster_qc_raw
Command exit status: 1
Command output: (empty)
Command error:
[M::mm_idx_gen::133.8070.97] collected minimizers
[M::mm_idx_gen::209.9550.97] sorted minimizers
[M::main::209.9560.97] loaded/built the index for 2677 target sequence(s)
[M::mm_mapopt_update::213.7060.97] mid_occ = 753
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 2677
[M::mm_idx_stat::215.9020.97] distinct minimizers: 167346621 (35.44% are singletons); average occurrences: 6.015; average spacing: 3.085; total length: 3105806778
[M::worker_pipeline::1508.4650.98] mapped 4376076 sequences
[M::worker_pipeline::2188.3510.98] mapped 4046703 sequences
[M::worker_pipeline::2209.6840.98] mapped 181916 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -ax splice -t 2 hg38as.fa RawData_full_length_reads.fastq
[M::main] Real time: 2211.288 sec; CPU: 2173.047 sec; Peak RSS: 18.386 GB
/home/epi2melabs/conda/lib/python3.8/site-packages/gffutils/parser.py:19: DeprecationWarning: invalid escape sequence \w
gff3_kw_pat = re.compile('\w+=')
[01:08:08 - workflow_glue] Starting entrypoint.
Traceback (most recent call last):
File "/home/ec2-user/.nextflow/assets/epi2me-labs/wf-transcriptomes/bin/workflow-glue", line 7, in
Hi, thanks for the feedback, i'll try to recreate this error and get back to you.
Dear @sarahjeeeze, I am stuck at the exact same step as @Caffeinated-Code at pipeline:denovo_assembly:build_backbones, any help on this will be greatly appreciated!! Best, CW
Hi @sarahjeeeze, Were you able to recreate the most recent error on your end? @CWYuan08
Hi, I am able to recreate the cluster_quality error if there are no alignments between the clusters and the supplied ref_genome. I will try to improve the error messages, if you run the denovo workflow without the ref_genome it should at least complete, but you will be missing the cluster quality data.. maybe insure the ref_genome you are using is relevant to the input data.
Dear @sarahjeeeze,
thank you for clarifying this, I wish to use our reference for this analysis (it is being used in other analysis without problem), is there anyway to separate out the clusters if there are no alignments? I am rerunning the pipeline now.
Btw, I am also wondering if there is an upper limit for the data size, I stuck at cluster_quality when using a MinION data (~6M reads), but when I try it with our PromethION data, it stuck again at making batches. Many thanks again!
Best, CW
Hi, Sorry for the late response. We would have to add something to the pipeline to seperate out the clusters if there are no alignments. There is not an upper limit for the data size but if you are using large samples you may want to adjust the computational resources available to the workflow and processes by adjusting the config. What is the error you get from the making batches process?
Closing through lack of response.
What happened?
wf-glue: error: unrecognized arguments: batches in denovo assembly Clustering
My servers is Centos 7, using github branches version and nextflow pulling version. Same error.
Operating System
ubuntu 20.04
Workflow Execution
Command line
Workflow Execution - EPI2ME Labs Versions
Using command line not EPI2ME Labs.
Workflow Execution - CLI Execution Profile
Docker
Workflow Version
wf-transcriptomes v0.1.10
Relevant log output