Error in assembly step - Githubissues

Matthijnssenslab / ViPER

Bioinformatics pipeline used in the Laboratory of Viral Metagenomics (KU Leuven) to trim and assemble paired-end Illumina reads, and classify resulting contigs.

GNU General Public License v3.0

5 stars 4 forks source link

Error in assembly step #9

Open ArianaBFulvi opened 5 days ago

ArianaBFulvi commented 5 days ago

Hello, I keep having this error when I try to run the pipeline:

Traceback (most recent call last):
  File "/home/miniconda3/envs/viper/bin/viper_cluster.py", line 21, in <module>
    from viper_clustering import viper_utilities as vu
ModuleNotFoundError: No module named 'viper_clustering'
mv: cannot stat ‘FINAL_Ch_H_V_S1_10000_500.fasta’: No such file or directory
ERROR! File not found (contigs): /results/3_tools/viper/CONTIGS/FINAL_Ch_H_V_S1_10000_500.contigs.fasta

In case you have troubles running QUAST, you can write to quast.support@cab.spbu.ru
or report an issue on our GitHub repository https://github.com/ablab/quast/issues
Please provide us with quast.log file from the output directory.
diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 48
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /dev/shm
#Target sequences to report alignments for: 25
Opening the database...  [0.133s]
Database: /tools/Diamond/viral_protein_db.dmnd (type: Diamond database, sequences: 683242, letters: 166116434)
Block size = 5000000000

Error opening file /results/3_tools/viper/CONTIGS/FINAL_Ch_H_V_S1_10000_500.contigs.fasta: No such file or directory

[2024-07-01 22:23:08] ERROR: Something went wrong with Diamond.

As you can see, the file needed for the Diamond+Krona step (sample_length.contigs.fasta) is not being generated. Any help would be appreciated. Thank you.

LanderDC commented 5 days ago

Hi, the pipeline fails at the clustering of the three assemblies because our custom clustering module is not installed, see this part of the error:

File "/home/miniconda3/envs/viper/bin/viper_cluster.py", line 21, in
from viper_clustering import viper_utilities as vu
ModuleNotFoundError: No module named 'viper_clustering'

This should be fixed if you run the following commands:

cd path/where/you/cloned/ViPER/repo
conda activate viper
pip install .

pip install . should install the viper_clustering module in python of the viper conda environment. Let me know if this works!

PS: If you don't want to rerun the assembly step, you can (after installing the clustering module) use the "$sample"_"$minlength"-unclustered.contigs.fasta in the CONTIGS folder and run:

viper_cluster.py -i "$sample"_"$minlength"-unclustered.contigs.fasta -o "$sample"_"$minlength" -t 48 --min-identity 95 --min-coverage 85 
viper-classify.sh -1 R1.fastq.gz -2 R2.fastq.gz -u unpaired.fastq.gz -d /path/to/diamond/db ...

ArianaBFulvi commented 3 days ago

Hello! thank you for answering. I followed the steps (re-installing everything, running all the pipeline again, running only the step that is failing) and I keep having the same error.

File "/home/miniconda3/envs/viper/bin/viper_cluster.py", line 21, in from viper_clustering import viper_utilities as vu ModuleNotFoundError: No module named 'viper_clustering'

The viper_cluster module is actually present in the corresponding location. Any help would be highly appreciated.

LanderDC commented 3 days ago

Something is still going wrong with the pip install command, can you share the output of running following commands?

conda activate viper
pip install /path/where/you/cloned/viper/repo

ArianaBFulvi commented 17 hours ago

Hello! I found a solution for the problem. In the repository cloning step, the file viper_utilities wasn't being copied correctly, so I copy-pasted the content manually from the repository file to mine. Also, I usually run bioinformatic tools from other directory (where I keep my results), not the one where all the information is, so I did export PYTHONPATH="/path/where/I/cloned/viper/repo" And it worked perfectly.

Now that I am analysing the results (I used the Diamond/Krona step to identify viruses in my sample) I saw that in the KRONA results directory, the HTML file contains each species identified and their abundances, and the .krona file has the samples and taxID's assigned to each.

My question is: Is there a way to obtain the number of reads assigned to each species identified? I need this value for my analysis. As always, thank you for your help.

LanderDC commented 16 hours ago

Great that you found a solution, it seems that your system used a different python installation over the conda installed one.

To answer your second question:

You can combine the $sample.krona and $sample.magnitudes file with awk

awk 'NR==FNR { a[$1]=$2; next} $1 in a {print $0,"\t"a[$1]}' CONTIGS/$sample.magnitudes KRONA/$sample.krona > $sample.magnitudes.tab

This will give a tab separated file with the contig name as first column, the taxid as second, average log e-value as third and the number of mapped reads as the last column:

NODE_A2_length_632_cov_15.832432_AMV1   6072    -21.8560540343745   271
NODE_B1_length_20247_cov_955.712295_AMV1    1312874 -450    5842283

You can convert the taxid to a scientific name with a tool like taxonkit.

ArianaBFulvi commented 15 hours ago

Thank you so much for the prompt response and the help provided!