AnantharamanLab / ViWrap

A wrapper to identify, bin, classify, and predict host-viral relationship for viruses
60 stars 14 forks source link

Error in the step "Run vRhyme to bin viral scaffolds. In processing... " #6

Open WoCer2019 opened 1 year ago

WoCer2019 commented 1 year ago

Hi, Thanks for your nice pipeline!

ViWrap 1.2.1 is encountering an error while running the command

./ViWrap/ViWrap run --input_metagenome /P/release/C/1-5_assembly/final_assembly.fasta --input_reads /P/release/paired/1-5.paired_1.fastq,/P/release/paired/1-5.paired_2.fastq --out_dir ./ViWrap_1-5_outdir --db_di /databases/ViWrap_db --identify_method vb-vs --conda_env_dir /miniconda3/envs --threads 90 --input_length_limit 2000

Error Message

[2023-02-17 00:09:05] | Run vRhyme to bin viral scaffolds. In processing...
Traceback (most recent call last):
  File "/home/LJ/software/ViWrap/ViWrap", line 172, in <module>
    output = cli()
  File "/home/LJ/software/ViWrap/ViWrap", line 166, in cli
    args["func"](args)
  File "/home/LJ/software/ViWrap/scripts/master_run.py", line 420, in main
    scripts.module.parse_checkv_result(vRhyme_best_bin_CheckV_result, CheckV_quality_summary)
  File "/home/LJ/software/ViWrap/scripts/module.py", line 480, in parse_checkv_result
    f = open(outfile, "w")
FileNotFoundError: [Errno 2] No such file or directory: './ViWrap_1-5_outdir/02_vRhyme_outdir/vRhyme_best_bins_fasta_CheckV_result/CheckV_quality_summary.txt'

Thanks!

plant272 commented 1 year ago

Thank you for your nice pipeline. I encountered the same error message using test data.

ChaoLab commented 1 year ago

Hi @plant272 @WoCer2019 I guess it is because that CheckV was not successful. Can you check to see if the results in 05_CheckV_outdir are there and correct?

Maybe it is because the CHECKVDB place should be set by CheckV again:

# You'll need to update your environment or use the -d flag to specify the CHECKVDB location:
export CHECKVDB=/path/to/checkv-db

Can you get into the ViWrap-CheckV conda env to tell CheckV the current db address and run the whole script again?

WoCer2019 commented 1 year ago

Thanks for your reply. I have updated the CheckV database using the -d flag, and run again. But the process has a new error.

[2023-02-19 11:31:05] | Pre-check inputings. In processing...
[2023-02-19 11:31:06] | Looks like the input metagenome and reads, database, and custom MAGs dir (if option used) are now set up well, start up to run ViWrap pipeline
[2023-02-19 11:31:06] | Run VIBRANT-VirSorter2 method. Run VIBRANT to identify and annotate virus from input metagenome. In processing...
[2023-02-19 12:24:05] | Run VIBRANT-VirSorter2 method. Run VIBRANT to identify and annotate viruses from input metagenome. Finished
[2023-02-19 12:24:05] | Run VIBRANT-VirSorter2 method. Run VirSorter2 to identify viruses from input metagenome. Also plus CheckV to QC and trim, and KEGG, Pfam, and VOG HMMs to annotate viruses. In processing...
[2023-02-19 16:58:12] | Run VIBRANT-VirSorter2 method. Run VirSorter2 the 1st time to identify viruses from input metagenome. Finished
[2023-02-19 16:59:06] | Run VIBRANT-VirSorter2 method. Run CheckV the 1st time to QC and trim viruses identified from VirSorter2 1st run. Finished
[2023-02-19 17:32:13] | Run VIBRANT-VirSorter2 method. Run VirSorter2 the 2nd time for CheckV-trimmed sequences. Finished
[2023-02-19 17:32:52] | Run VIBRANT-VirSorter2 method. Run CheckV the 2nd time to get viral and host gene counts. Finished
[2023-02-19 17:33:58] | Run VIBRANT-VirSorter2 method. Run VIBRANT to check "keep2" and "manual_check" groups and get the final VirSorter2 virus sequences. Finished
[2023-02-19 17:34:03] | Map reads to metagenome. In processing...
[2023-02-20 00:14:33] | Map reads to metagenome. Finished
[2023-02-20 00:14:33] | Run vRhyme to bin viral scaffolds. In processing...
[2023-02-20 00:17:15] | Run vRhyme to bin viral scaffolds. Finished
[2023-02-20 00:17:15] | Run vContact2 to cluster viral genomes. In processing...
[2023-02-20 02:23:21] | Run vContact2 to cluster viral genomes. Finished
[2023-02-20 02:23:21] | Run CheckV to evaluate virus genome quality. In processing...
[2023-02-20 02:30:42] | Run CheckV to evaluate virus genome quality. Finished
[2023-02-20 02:30:42] | Run dRep to cluster virus species. In processing...
[2023-02-20 02:30:53] | Run dRep to cluster virus species. Finished
[2023-02-20 02:30:53] | Conduct taxonomic charaterization. In processing...
[2023-02-20 02:34:44] | Conduct taxonomic charaterization. Finished
[2023-02-20 02:34:44] | Conduct Host prediction by iPHoP. In processing...
[2023-02-20 03:34:49] | Conduct Host prediction by iPHoP. Finished
[2023-02-20 03:34:49] | Get virus genome abundance. Finished
Traceback (most recent call last):
  File "/home/LJ/software/ViWrap/ViWrap", line 172, in <module>
    output = cli()
  File "/home/LJ/software/ViWrap/ViWrap", line 166, in cli
    args["func"](args)
  File "/home/LJ/software/ViWrap/scripts/master_run.py", line 597, in main
    scripts.module.get_virus_genome_annotation_result(args)
  File "/home/LJ/software/ViWrap/scripts/module.py", line 1565, in get_virus_genome_annotation_result
    items = annotation_result[protein]
KeyError: '1-1_k141_29333_length_179915_cov_34.0047_fragment_1_110\t(107377..107571)\t-1\tPF09048.10\tCro

Do you have other suggestions? Thank you!

ChaoLab commented 1 year ago

Can you update the "module.py" and "master_run.py" from the latest GitHub repo? Because I have made several changes to deal with this type of errors yesterday

WoCer2019 commented 1 year ago

I have updated the "module.py" and "master_run.py" from the latest GitHub repo. But the same error still exists.

[2023-02-20 18:25:39] | Map reads to metagenome. Finished
[2023-02-20 18:25:39] | Run vRhyme to bin viral scaffolds. In processing...
[2023-02-20 18:27:13] | Run vRhyme to bin viral scaffolds. Finished
[2023-02-20 18:27:13] | Run vContact2 to cluster viral genomes. In processing...
[2023-02-20 19:54:04] | Run vContact2 to cluster viral genomes. Finished
[2023-02-20 19:54:04] | Run CheckV to evaluate virus genome quality. In processing...
[2023-02-20 19:59:38] | Run CheckV to evaluate virus genome quality. Finished
[2023-02-20 19:59:38] | Run dRep to cluster virus species. In processing...
[2023-02-20 19:59:45] | Run dRep to cluster virus species. Finished
[2023-02-20 19:59:45] | Conduct taxonomic charaterization. In processing...
[2023-02-20 20:02:25] | Conduct taxonomic charaterization. Finished
[2023-02-20 20:02:25] | Conduct Host prediction by iPHoP. In processing...
[2023-02-20 20:56:28] | Conduct Host prediction by iPHoP. Finished
[2023-02-20 20:56:28] | Get virus genome abundance. Finished
Traceback (most recent call last):
  File "/home/LJ/software/ViWrap/ViWrap", line 172, in <module>
    output = cli()
  File "/home/LJ/software/ViWrap/ViWrap", line 166, in cli
    args["func"](args)
  File "/home/LJ/software/ViWrap/scripts/master_run.py", line 599, in main
    scripts.module.get_virus_genome_annotation_result(args)
  File "/home/LJ/software/ViWrap/scripts/module.py", line 1574, in get_virus_genome_annotation_result
    items = annotation_result[protein]
KeyError: '3-1_k141_33294_length_17161_cov_7.0000_19\t(12244..12657)\t1\tPF01381.22\tHelix-turn-helix
ChaoLab commented 1 year ago

I re-checked the scripts, can you pull down the new script of "module.py", and re-run it again?

plant272 commented 1 year ago

Hello @ChaoLab I have updated the newest "module.py" and "master_run.py" from the latest GitHub repo, and re-run it using the test datasets. The similar error exists.

[2023-02-22 15:01:06] | Map reads to metagenome. Finished
[2023-02-22 15:01:06] | Run vRhyme to bin viral scaffolds. In processing...
[2023-02-22 15:02:21] | Run vRhyme to bin viral scaffolds. Finished
[2023-02-22 15:02:21] | Run vContact2 to cluster viral genomes. In processing...
[2023-02-22 15:52:52] | Run vContact2 to cluster viral genomes. Finished
[2023-02-22 15:52:52] | Run CheckV to evaluate virus genome quality. In processing...
[2023-02-22 15:54:33] | Run CheckV to evaluate virus genome quality. Finished
[2023-02-22 15:54:33] | Run dRep to cluster virus species. In processing...
[2023-02-22 15:54:39] | Run dRep to cluster virus species. Finished
[2023-02-22 15:54:39] | Conduct taxonomic charaterization. In processing...
[2023-02-22 15:55:34] | Conduct taxonomic charaterization. Finished
[2023-02-22 15:55:34] | Conduct Host prediction by iPHoP. In processing...
[2023-02-22 15:56:38] | Conduct Host prediction by iPHoP. Finished
[2023-02-22 15:56:38] | Get virus genome abundance. Finished
Traceback (most recent call last):
  File "/project/jzh1/ViWrap/ViWrap", line 172, in <module>
    output = cli()
  File "/project/jzh1/ViWrap/ViWrap", line 166, in cli
    args["func"](args)
  File "/project/jzh1/ViWrap/scripts/master_run.py", line 598, in main
    scripts.module.combine_iphop_results(args, combined_host_pred_to_genome_result, combined_host_pred_to_genus_result)
  File "/project/jzh1/ViWrap/scripts/module.py", line 1373, in combine_iphop_results
    with open(host_pred_to_genome_m90, 'r') as lines:
FileNotFoundError: [Errno 2] No such file or directory: './test_outdir/07_iPHoP_outdir/Host_prediction_to_genome_m90.csv'

I checked the results of ./test_outdir/07_iPHoP_outdir/ and only displayed below

drwxr-xr-x. 4 jzh1 biostack   4096 Feb 22 15:56 Wdir
-rw-r--r--. 1 jzh1 biostack 513833 Feb 22 15:55 all_vRhyme_fasta.Nlinked_viral_gn_clean.fna

Is that because that iPHoP conda env and db have not been properly installed? Could you provide some specific codes to test and fix it? Thank you.

ChaoLab commented 1 year ago

It seems that this is a different issue compared to the previous one. I guess it has something wrong with the iPHoP running. Can you make a test to see if your iPHoP conda env and db have been properly installed by using the test datasets and following the instructions provided by iPHoP (https://bitbucket.org/srouxjgi/iphop/src/main/)?

WoCer2019 commented 1 year ago

Hello @ChaoLab, I have updated all the scripts, and re-run it again. The same error still exists. So sad.

plant272 commented 1 year ago

As suggested, I used my iPHoP conda env and db to test the datasets provided by iPHoP(https://bitbucket.org/srouxjgi/iphop/src/main/), and found Errno 2 below. Do you have any suggestion? Thank you!

(/project/jzh1/.conda/ViWrap_conda_environments/ViWrap-iPHoP) [jzh1@localhost ~]$ iphop predict --fa_file test_input_phages.fna --db_dir /project/jzh1/.conda/ViWrap_db/iPHoP_db/iPHoP_db --out_dir iphop_test_results/test_input_phages_iphop 
### Welcome to iPHoP ###
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Run] Running blastn against genomes...
[1/3/Run] Get relevant blast matches...
[2/1/Run] Running blastn against CRISPR...
[2/2/Run] Get relevant crispr matches...
[3/1/Run] Running WIsH...
/bin/sh: line 1: 346315 Segmentation fault      /project/jzh1/.conda/ViWrap_conda_environments/ViWrap-iPHoP/lib/python3.8/site-packages/iphop/utils/WIsH -c predict -g iphop_test_results/test_input_phages_iphop/Wdir/split_input/ -m /project/jzh1/.conda/ViWrap_db/iPHoP_db/iPHoP_db/db/wish_models -n /project/jzh1/.conda/ViWrap_db/iPHoP_db/iPHoP_db/db_infos/Wish_negFits.csv -r iphop_test_results/test_input_phages_iphop/Wdir/wish_results/ -t 1 -b > iphop_test_results/test_input_phages_iphop/Wdir/wish.log 2>&1
[3/2/Run] Get relevant WIsH hits...
Traceback (most recent call last):
  File "/project/jzh1/.conda/ViWrap_conda_environments/ViWrap-iPHoP/bin/iphop", line 10, in <module>
    sys.exit(cli())
  File "/project/jzh1/.conda/ViWrap_conda_environments/ViWrap-iPHoP/lib/python3.8/site-packages/iphop/iphop.py", line 122, in cli
    args["func"](args)
  File "/project/jzh1/.conda/ViWrap_conda_environments/ViWrap-iPHoP/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 79, in main
    wish.run_and_parse_wish(args)
  File "/project/jzh1/.conda/ViWrap_conda_environments/ViWrap-iPHoP/lib/python3.8/site-packages/iphop/modules/wish.py", line 48, in run_and_parse_wish
    get_wish_results(args["fasta_file"],args["wishrawresult"],args["wishparsed"],args['messages'])
  File "/project/jzh1/.conda/ViWrap_conda_environments/ViWrap-iPHoP/lib/python3.8/site-packages/iphop/modules/wish.py", line 61, in get_wish_results
    with open(pred_file, newline='') as csvfile:
FileNotFoundError: [Errno 2] No such file or directory: 'iphop_test_results/test_input_phages_iphop/Wdir/wish_results/prediction.list'
ChaoLab commented 1 year ago

I have no idea. It seems to have something wrong with 'WIsH'. Did you re-install all the stuff for iPHoP, including conda env and db? iPHoP itself is very a complex package too, it will be feasible to troubleshoot iPHoP first.