HubertTang / RdRpBin

8 stars 0 forks source link

FileNotFoundError: [Errno 2] #2

Closed yosei-yung closed 1 year ago

yosei-yung commented 2 years ago

Hi, Hubert, Thanks for developing this nice tool. I just want to use such tool to detect the RdRp from my metatranscriptom data at the read-level. According to the README.md file. I create an input_dir and put my reads inside of the input_dir. And runing the RdRpBin just using the default parameters like this

source /etc/profile.d/modules.sh
module load RdRpBin
PATH1=/lustre/aptmp/ideas2/qingwei/KS-21/RNA-seq/RdRpBin
main.py -f fastq -t 20 ${PATH}/input_dir/M055_F_mRNA_fwd_0.fq 

But, it always give me the error like this:

TERM: Undefined variable.
Loading RdRpBin
  Loading requirement: Miniconda/3.6 cuda/10.2
Traceback (most recent call last):
  File "/usr/appli/freeware/RdRpBin/main.py", line 299, in <module>
    run_on_real(args = rdrpbin_args,
  File "/usr/appli/freeware/RdRpBin/main.py", line 168, in run_on_real
    seq_utils.fasta2csv_pre(fasta_file=sim_reads_path,
  File "/usr/appli/freeware/RdRpBin/seq_utils.py", line 67, in fasta2csv_pre
    with open(f"{fasta_dir}/test_rdrp_sim.csv", 'w') as csv:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/appli/freeware/miniconda/3.6/envs/RdRpBin/bin:/usr/appli/freeware/RdRpBin:/usr/appli/freeware/miniconda/3.6/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/lib64/qt-3.3/bin:/usr/appli/freeware/Modules/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/usr/appli/freeware/bin:/opt/ibutils/bin:/usr/appli/pbs/default/bin:/sbin:/bin:/bio/user/ideas2/bin/input_dir/test_rdrp_sim.csv'

I can't figure out what's wrong with my directory. Can you help me to deal with this error? Looking forward to hearing from you soon. Best Regards, Qingwei

HubertTang commented 2 years ago

Hi Qinwei,

I think the error should be that you made a typo in the fourth line of the command line.

Should be "main.py -f fastq -t 20 ${PATH1}/input_dir/M055_F_mRNA_fwd_0.fq", not {PATH}.

Best, Xubo

yosei-yung commented 2 years ago

Dear Xubo, Oh, I see. I am so sorry for suh careless mistake. Thank you so much for your kind replying. By the way, how can I deal with pair-end fastq files? Do I need to merge the pair-end files into one? And Can I make my own training database, I mean own reference database? Best, Qingwei

HubertTang commented 2 years ago

Hi Qingwei,

For the pair-end reads, you can merge them and input them into the model. Custom databases are not currently supported. I will update this feature in the coming days if needed. I'll let you know when it's done.

Best, Xubo

yosei-yung commented 2 years ago

Dear Xubo, Excellent! Looking forward to new version of RdRpBin with creating custome database option. Well, unfortunately, I still encountered new errors.

Reads parsed:   8022
Reads kept:     8022 (1)
Reads failed primer screen:     0 (0)
Bases parsed:   1194801
Bases kept:     1194801 (1)
Number of incorrectly paired reads that were discarded: 0
[timer - sga preprocess] wall clock: 0.08s CPU: 0.03s
[timer - sga index] wall clock: 0.62s CPU: 4.72s
[timer - sga::overlap] wall clock: 2854.30s CPU: 3549.90s
/usr/appli/freeware/miniconda/3.6/envs/RdRpBin/lib/python3.8/site-packages/Bio/Seq.py:2979: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
Traceback (most recent call last):
  File "/usr/appli/freeware/RdRpBin/main.py", line 299, in <module>
    run_on_real(args = rdrpbin_args,
  File "/usr/appli/freeware/RdRpBin/main.py", line 194, in run_on_real
    graph_utils.run_prc(train_csv=f"{database_name}/data/train.csv", 
  File "/usr/appli/freeware/RdRpBin/graph_utils.py", line 55, in run_prc
    temp_arr[train_idx_to_label[node]] = 1.0
IndexError: index 12 is out of bounds for axis 0 with size 0

My fastq file already trimmed using fastp with no N allowed and minmum length 50 bp. I am not sure my error is the same as the first guy--biofuture--who put the first issue in here. According to your reply, it seems you already fixed this error. And I installed RdRpBin in 16th June. I just use the default parameters of RdRpBin, so I have no any ideas about this error. Best, Qingwei

HubertTang commented 2 years ago

Hi Qingwei,

Have you changed the name of the default database ("RdRpBin_db")? In that case, please change the name of the database back to ‘RdRpBin_db’. Sorry, the parameter “--database” causes the ambiguity of being able to use a custom database. Currently, only the default database with the default name is supported. Later I will add the feature to customize the database to make the program more flexible.

Best, Xubo

yosei-yung commented 2 years ago

Dear Xubo, I think I haven't changed the default database name, my command was showed below

main.py -f fastq -t 20 ${PATH1}/input_dir/M055_F_mRNA_rev_0.fq

And the generated files just stop here

6721877 Jun 28 09:32 output.blastx
836484 Jun 28 09:36 output.blastx.prot
1638511 Jun 28 09:36 output.blastx.nucl
13847422617 Jun 28 09:39 test_rdrp_sim.csv.index.fasta.sga.prep
1898531434 Jun 28 09:53 test_rdrp_sim.csv.index.fasta.sga.bwt
737388886 Jun 28 09:57 test_rdrp_sim.csv.index.fasta.sga.sai
882424211 Jun 28 10:11 test_rdrp_sim.csv.index.fasta.sga.rbwt
737388886 Jun 28 10:15 test_rdrp_sim.csv.index.fasta.sga.rsai
1638511 Jun 28 10:15 output.blastx.nucl.sga.prep
55060 Jun 28 10:15 output.blastx.nucl.sga.sai
55060 Jun 28 10:15 output.blastx.nucl.sga.rsai
194295 Jun 28 10:15 output.blastx.nucl.sga.rbwt
195200 Jun 28 10:15 output.blastx.nucl.sga.bwt
15056559774 Jun 28 11:03 test_rdrp_sim.csv.index.fasta.sga.output.blastx.nucl.sga.asqg
6721877 Jun 28 11:04 log

Inside of the log directory, the blastx_mv contain one file prediction.csv, while the prc is empty.

Best, Qingwei

HubertTang commented 2 years ago

Hi Qingwei,

I checked the code but haven't found any other cause for this problem. Here I attach a test file on which the program should run fine. Could you please try RdRpBin on this data and see if the error still occurs?

With thanks, test.txt

Xubo

yosei-yung commented 2 years ago

Dear Xubo,

I test again with the fasta file you provided. I got the same error like:


Reads parsed:   281
Reads kept:     281 (1)
Reads failed primer screen:     0 (0)
Bases parsed:   41191
Bases kept:     41191 (1)
Number of incorrectly paired reads that were discarded: 0
[timer - sga preprocess] wall clock: 0.00s CPU: 0.00s
[timer - sga index] wall clock: 0.64s CPU: 4.41s
[timer - sga::overlap] wall clock: 0.12s CPU: 0.10s
/usr/appli/freeware/miniconda/3.6/envs/RdRpBin/lib/python3.8/site-packages/Bio/Seq.py:2979: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
Traceback (most recent call last):
  File "/usr/appli/freeware/RdRpBin/main.py", line 299, in <module>
    run_on_real(args = rdrpbin_args,
  File "/usr/appli/freeware/RdRpBin/main.py", line 194, in run_on_real
    graph_utils.run_prc(train_csv=f"{database_name}/data/train.csv", 
  File "/usr/appli/freeware/RdRpBin/graph_utils.py", line 55, in run_prc
    temp_arr[train_idx_to_label[node]] = 1.0
IndexError: index 9 is out of bounds for axis 0 with size 0
And the `prc` directory is empty.

Qingwei
HubertTang commented 2 years ago

Thanks for your testing, Qingwei. I'm still working on it but haven't found the bug.

Have you ever modified the codes, especially the 'main.py'? If so, could you please redownload the codes and run RdRpBin on the test.txt again?

With thanks, Xubo

yosei-yung commented 2 years ago

Dear Xubo, With the help of engineer in my lab, the error of RdRpBin have been fixed. I re-test the command using the test.txt you provided. It ran very well and I got nice results. He haven't told very detail about this error. But he recommand me to use Graphics precoessing unit (GPU), which will be more faster than central processing unit (CPU), about 3 times faster. I think it is resonable because RdRpBin contian the CNN and GCN meachine learning processes. Anyway, thank you so much for your kindly reply and your nice tools RdRpBin Best, Qingwei

HubertTang commented 2 years ago

Glad to hear that, and thank you for the feedback! Best, Xubo