biocore / deblur

Deblur is a greedy deconvolution algorithm based on known read error profiles.
BSD 3-Clause "New" or "Revised" License
92 stars 41 forks source link

indexdb_rna not found #209

Open Jigyasa3 opened 2 years ago

Jigyasa3 commented 2 years ago

Dear all,

I am running deblur standalone from qiime, and sortmerna is unable to generate the index for the RNA file. I am using python 3.5.2 and sortmerna 4.3.4

Code- DB_DIR="/flash/BourguignonU/Tool/deblur/deblur/support_files"

/flash/BourguignonU/Tool/deblur/scripts/deblur workflow --seqs-fp ${IN_DIR}/ --output-dir ${OUT_DIR}/deblur_output --trim-length -1 --pos-ref-fp ${DB_DIR}/88_otus.fasta --neg-ref-fp ${DB_DIR}/artifacts.fa --threads-per-sample 30

Error- Traceback (most recent call last): File "/flash/BourguignonU/Tool/deblur/scripts/deblur", line 684, in <module> deblur_cmds() File "/apps/free81/python/3.5.2/lib/python3.5/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/apps/free81/python/3.5.2/lib/python3.5/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/apps/free81/python/3.5.2/lib/python3.5/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/apps/free81/python/3.5.2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/apps/free81/python/3.5.2/lib/python3.5/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/flash/BourguignonU/Tool/deblur/scripts/deblur", line 610, in workflow working_dir=working_dir) File "/home/j/jigyasa-arora/.local/lib/python3.5/site-packages/deblur-1.1.0.dev0-py3.5.egg/deblur/workflow.py", line 222, in build_index_sortmerna sout, serr, res = _system_call(params) File "/home/j/jigyasa-arora/.local/lib/python3.5/site-packages/deblur-1.1.0.dev0-py3.5.egg/deblur/workflow.py", line 959, in _system_call stderr=subprocess.PIPE) File "/apps/free81/python/3.5.2/lib/python3.5/subprocess.py", line 947, in __init__ restore_signals, start_new_session) File "/apps/free81/python/3.5.2/lib/python3.5/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg) FileNotFoundError: [Errno 2] No such file or directory: 'indexdb_rna'

Log file- INFO(139806258345792)2022-04-21 18:28:53,241:************************* INFO(139806258345792)2022-04-21 18:28:53,242:deblurring started WARNING(139806258345792)2022-04-21 18:28:53,242:deblur version 1.1.0-dev workflow started on /flash/BourguignonU/Jigs/16S_meta/16S_rawreads/16S_raw_PRJEB27458_11721 WARNING(139806258345792)2022-04-21 18:28:53,242:parameters: {'left_trim_length': 0, 'log_level': 2, 'output_dir': '/flash/BourguignonU/Jigs/16S_meta/16S_rawreads/16S_raw_PRJEB27458_11721/deblur_output', 'min_reads': 10, 'indel_max': 3, 'neg_ref_fp': ('/flash/BourguignonU/Tool/deblur/deblur/support_files/artifacts.fa',), 'logger': <logging.Logger object at 0x7f270f50fcf8>, 'log_file': '/flash/BourguignonU/Jigs/16S_meta/16S_rawreads/deblur.log', 'seqs_fp': '/flash/BourguignonU/Jigs/16S_meta/16S_rawreads/16S_raw_PRJEB27458_11721', 'pos_ref_db_fp': (), 'error_dist': [1, 0.06, 0.02, 0.02, 0.01, 0.005, 0.005, 0.005, 0.001, 0.001, 0.001, 0.0005], 'overwrite': False, 'keep_tmp_files': False, 'threads_per_sample': 30, 'is_worker_thread': False, 'indel_prob': 0.01, 'min_size': 2, 'jobs_to_start': 1, 'neg_ref_db_fp': (), 'mean_error': 0.005, 'trim_length': -1, 'pos_ref_fp': ('/flash/BourguignonU/Tool/deblur/deblur/support_files/88_otus.fasta',)} INFO(139806258345792)2022-04-21 18:28:53,243:error_dist is : [1, 0.06, 0.02, 0.02, 0.01, 0.005, 0.005, 0.005, 0.001, 0.001, 0.001, 0.0005] INFO(139806258345792)2022-04-21 18:28:53,243:deblur main program started INFO(139806258345792)2022-04-21 18:28:53,243:processing directory /flash/BourguignonU/Jigs/16S_meta/16S_rawreads/16S_raw_PRJEB27458_11721 INFO(139806258345792)2022-04-21 18:28:53,247:building negative db sortmerna index files INFO(139806258345792)2022-04-21 18:28:53,247:build_index_sortmerna files ('/flash/BourguignonU/Tool/deblur/deblur/support_files/artifacts.fa',) to dir /flash/BourguignonU/Jigs/16S_meta/16S_rawreads/16S_raw_PRJEB27458_11721/deblur_output/deblur_working_dir

wasade commented 2 years ago

Hi @Jigyasa3, how was sortmerna installed? indexdb_rna is one of its programs which the traceback reports as not being found

Jigyasa3 commented 2 years ago

Hi @wasade ,

Thank you for replying! I was using the most recent version of sortmerna in which the indexdb_rna is replaced with indexdb. But as recommended in the deblur dependencies, I reinstalled the sortmerna 2.0 version. The indexing is working now, but I am running into new errors.

error- ValueError: Not all sequence have the same length. Aligned lengths: 3860, sequence lengths: 128, 130, 131, 132, 133, 134, 135, 136, 122, 123, 126, 127

log file deblur.log

Deblur is generating the following files- 8334162 Jake_Mackeral_1_3_dadafilt_R1.fastq.gz.trim 142342 Jake_Mackeral_1_3_dadafilt_R1.fastq.gz.trim.derep 142342 Jake_Mackeral_1_3_dadafilt_R1.fastq.gz.trim.derep.no_artifacts 1209907 Jake_Mackeral_1_3_dadafilt_R1.fastq.gz.trim.derep.no_artifacts.msa 37148 Jake_Mackeral_1_3_dadafilt_R1.fastq.gz.trim.derep.no_artifacts.msa.deblur 36364 Jake_Mackeral_1_3_dadafilt_R1.fastq.gz.trim.derep.no_artifacts.msa.deblur.no_chimeras 71171 Jake_Mackeral_1_3_dadafilt_R1.fastq.gz.trim.derep.sortmerna.blast

I trimmed the raw reads using cutadapt, and manual checking of the filtered raw reads shows 136 bp reads.

Any suggestions?

wasade commented 2 years ago

Could you try specifying the trim parameter when running deblur? The sequences are not a consistent length but the deblur algorithm needs them to be. note that if a 5' trim was performed with cutadapt then there likely will be oddities with deblur -- for 16S V4, we typically do not run cutadapt and just doing the 3' trim with deblur. See for instance the qiita workflows

Jigyasa3 commented 2 years ago

Hi @wasade,

Thank you for replying and sharing the qiita workflow link. I have a follow-up question, if I understand correctly. Qiita suggests usingfastp for trimming followed by deblur. I tested two workflows-

A

remove the first 15bps and reads below quality 19 and all readlengths to 150bps-

fastp -i ${IN_DIR}/${file1} -o ${OUT_DIR}/${file2} -f 15 -q 19 -L -b 150

run deblur-

/flash/BourguignonU/Tool/deblur/scripts/deblur workflow --seqs-fp ${IN_DIR}/ --output-dir ${OUT_DIR}/deblur_output --trim-length -1 --pos-ref-fp ${DB_DIR}/88_otus.fasta --neg-ref-fp ${DB_DIR}/artifacts.fa --threads-per-sample 30

But I am getting the same error- ValueError: Not all sequence have the same length. Aligned lengths: 4069, sequence lengths: 128, 129, 130, 131, 132, 133, 134, 135, 136, 122, 123, 124, 125, 126, 127

B

trimming using deblur-

/flash/BourguignonU/Tool/deblur/scripts/deblur workflow --seqs-fp ${IN_DIR}/ --output-dir ${OUT_DIR}/deblur_output --left-trim-length 15 --trim-length 150 --pos-ref-fp ${DB_DIR}/88_otus.fasta --neg-ref-fp ${DB_DIR}/artifacts.fa --threads-per-sample 30

Error- /home/j/jigyasa-arora/.local/lib/python3.5/site-packages/deblur-1.1.0.dev0-py3.5.egg/deblur/workflow.py:147: UserWarning: Vast majority of sequences (8902878 / 8902878) are shorter than the trim length (150). Are you using the correct -t trim length? warnings.warn(errmsg, UserWarning) /home/j/jigyasa-arora/.local/lib/python3.5/site-packages/deblur-1.1.0.dev0-py3.5.egg/deblur/workflow.py:851: UserWarning: Problem removing artifacts from file /flash/BourguignonU/Jigs/16S_meta/16S_rawreads/16S_raw_PRJEB27458_11721/fastp/Jake_Mackeral_2_4_fastp_R1_001.fastq.gz seqs_fp, UserWarning)

wasade commented 2 years ago

(A) we do not typically do fastp for 16S in Qiita, right @antgonza?

(B) it is saying the trim length is to large, it seems your sequences may be shorter than that?

antgonza commented 2 years ago

(A) that sounds about right.

@Jigyasa3, note that deblur needs to have all the sequences be the same length, which you can do with fastp but your current parameters are not doing that; suggest taking a look at the fastp documentation.