Closed yosei-yung closed 1 year ago
Hi Qinwei,
I think the error should be that you made a typo in the fourth line of the command line.
Should be "main.py -f fastq -t 20 ${PATH1}/input_dir/M055_F_mRNA_fwd_0.fq", not {PATH}.
Best, Xubo
Dear Xubo, Oh, I see. I am so sorry for suh careless mistake. Thank you so much for your kind replying. By the way, how can I deal with pair-end fastq files? Do I need to merge the pair-end files into one? And Can I make my own training database, I mean own reference database? Best, Qingwei
Hi Qingwei,
For the pair-end reads, you can merge them and input them into the model. Custom databases are not currently supported. I will update this feature in the coming days if needed. I'll let you know when it's done.
Best, Xubo
Dear Xubo,
Excellent! Looking forward to new version of RdRpBin
with creating custome database option.
Well, unfortunately, I still encountered new errors.
Reads parsed: 8022
Reads kept: 8022 (1)
Reads failed primer screen: 0 (0)
Bases parsed: 1194801
Bases kept: 1194801 (1)
Number of incorrectly paired reads that were discarded: 0
[timer - sga preprocess] wall clock: 0.08s CPU: 0.03s
[timer - sga index] wall clock: 0.62s CPU: 4.72s
[timer - sga::overlap] wall clock: 2854.30s CPU: 3549.90s
/usr/appli/freeware/miniconda/3.6/envs/RdRpBin/lib/python3.8/site-packages/Bio/Seq.py:2979: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
warnings.warn(
Traceback (most recent call last):
File "/usr/appli/freeware/RdRpBin/main.py", line 299, in <module>
run_on_real(args = rdrpbin_args,
File "/usr/appli/freeware/RdRpBin/main.py", line 194, in run_on_real
graph_utils.run_prc(train_csv=f"{database_name}/data/train.csv",
File "/usr/appli/freeware/RdRpBin/graph_utils.py", line 55, in run_prc
temp_arr[train_idx_to_label[node]] = 1.0
IndexError: index 12 is out of bounds for axis 0 with size 0
My fastq file already trimmed using fastp
with no N
allowed and minmum length 50 bp
.
I am not sure my error is the same as the first guy--biofuture--who put the first issue in here. According to your reply, it seems you already fixed this error. And I installed RdRpBin
in 16th June.
I just use the default parameters of RdRpBin
, so I have no any ideas about this error.
Best,
Qingwei
Hi Qingwei,
Have you changed the name of the default database ("RdRpBin_db")? In that case, please change the name of the database back to ‘RdRpBin_db’. Sorry, the parameter “--database” causes the ambiguity of being able to use a custom database. Currently, only the default database with the default name is supported. Later I will add the feature to customize the database to make the program more flexible.
Best, Xubo
Dear Xubo, I think I haven't changed the default database name, my command was showed below
main.py -f fastq -t 20 ${PATH1}/input_dir/M055_F_mRNA_rev_0.fq
And the generated files just stop here
6721877 Jun 28 09:32 output.blastx
836484 Jun 28 09:36 output.blastx.prot
1638511 Jun 28 09:36 output.blastx.nucl
13847422617 Jun 28 09:39 test_rdrp_sim.csv.index.fasta.sga.prep
1898531434 Jun 28 09:53 test_rdrp_sim.csv.index.fasta.sga.bwt
737388886 Jun 28 09:57 test_rdrp_sim.csv.index.fasta.sga.sai
882424211 Jun 28 10:11 test_rdrp_sim.csv.index.fasta.sga.rbwt
737388886 Jun 28 10:15 test_rdrp_sim.csv.index.fasta.sga.rsai
1638511 Jun 28 10:15 output.blastx.nucl.sga.prep
55060 Jun 28 10:15 output.blastx.nucl.sga.sai
55060 Jun 28 10:15 output.blastx.nucl.sga.rsai
194295 Jun 28 10:15 output.blastx.nucl.sga.rbwt
195200 Jun 28 10:15 output.blastx.nucl.sga.bwt
15056559774 Jun 28 11:03 test_rdrp_sim.csv.index.fasta.sga.output.blastx.nucl.sga.asqg
6721877 Jun 28 11:04 log
Inside of the log directory
, the blastx_mv
contain one file prediction.csv
, while the prc
is empty.
Best, Qingwei
Hi Qingwei,
I checked the code but haven't found any other cause for this problem. Here I attach a test file on which the program should run fine. Could you please try RdRpBin on this data and see if the error still occurs?
With thanks, test.txt
Xubo
Dear Xubo,
I test again with the fasta file you provided. I got the same error like:
Reads parsed: 281
Reads kept: 281 (1)
Reads failed primer screen: 0 (0)
Bases parsed: 41191
Bases kept: 41191 (1)
Number of incorrectly paired reads that were discarded: 0
[timer - sga preprocess] wall clock: 0.00s CPU: 0.00s
[timer - sga index] wall clock: 0.64s CPU: 4.41s
[timer - sga::overlap] wall clock: 0.12s CPU: 0.10s
/usr/appli/freeware/miniconda/3.6/envs/RdRpBin/lib/python3.8/site-packages/Bio/Seq.py:2979: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
warnings.warn(
Traceback (most recent call last):
File "/usr/appli/freeware/RdRpBin/main.py", line 299, in <module>
run_on_real(args = rdrpbin_args,
File "/usr/appli/freeware/RdRpBin/main.py", line 194, in run_on_real
graph_utils.run_prc(train_csv=f"{database_name}/data/train.csv",
File "/usr/appli/freeware/RdRpBin/graph_utils.py", line 55, in run_prc
temp_arr[train_idx_to_label[node]] = 1.0
IndexError: index 9 is out of bounds for axis 0 with size 0
And the `prc` directory is empty.
Qingwei
Thanks for your testing, Qingwei. I'm still working on it but haven't found the bug.
Have you ever modified the codes, especially the 'main.py'? If so, could you please redownload the codes and run RdRpBin on the test.txt again?
With thanks, Xubo
Dear Xubo,
With the help of engineer in my lab, the error of RdRpBin
have been fixed. I re-test the command using the test.txt you provided. It ran very well and I got nice results.
He haven't told very detail about this error. But he recommand me to use Graphics precoessing unit (GPU), which will be more faster than central processing unit (CPU), about 3 times faster. I think it is resonable because RdRpBin
contian the CNN and GCN meachine learning processes.
Anyway, thank you so much for your kindly reply and your nice tools RdRpBin
Best,
Qingwei
Glad to hear that, and thank you for the feedback! Best, Xubo
Hi, Hubert, Thanks for developing this nice tool. I just want to use such tool to detect the RdRp from my metatranscriptom data at the read-level. According to the README.md file. I create an
input_dir
and put my reads inside of theinput_dir
. And runing the RdRpBin just using the default parameters like thisBut, it always give me the error like this:
I can't figure out what's wrong with my directory. Can you help me to deal with this error? Looking forward to hearing from you soon. Best Regards, Qingwei