Open ArianaBFulvi opened 5 days ago
Hi, the pipeline fails at the clustering of the three assemblies because our custom clustering module is not installed, see this part of the error:
File "/home/miniconda3/envs/viper/bin/viper_cluster.py", line 21, in
from viper_clustering import viper_utilities as vu
ModuleNotFoundError: No module named 'viper_clustering'
This should be fixed if you run the following commands:
cd path/where/you/cloned/ViPER/repo
conda activate viper
pip install .
pip install .
should install the viper_clustering
module in python of the viper conda environment. Let me know if this works!
PS: If you don't want to rerun the assembly step, you can (after installing the clustering module) use the "$sample"_"$minlength"-unclustered.contigs.fasta
in the CONTIGS
folder and run:
viper_cluster.py -i "$sample"_"$minlength"-unclustered.contigs.fasta -o "$sample"_"$minlength" -t 48 --min-identity 95 --min-coverage 85
viper-classify.sh -1 R1.fastq.gz -2 R2.fastq.gz -u unpaired.fastq.gz -d /path/to/diamond/db ...
Hello! thank you for answering. I followed the steps (re-installing everything, running all the pipeline again, running only the step that is failing) and I keep having the same error.
File "/home/miniconda3/envs/viper/bin/viper_cluster.py", line 21, in from viper_clustering import viper_utilities as vu ModuleNotFoundError: No module named 'viper_clustering'
The viper_cluster module is actually present in the corresponding location. Any help would be highly appreciated.
Something is still going wrong with the pip install
command, can you share the output of running following commands?
conda activate viper
pip install /path/where/you/cloned/viper/repo
Hello! I found a solution for the problem. In the repository cloning step, the file viper_utilities wasn't being copied correctly, so I copy-pasted the content manually from the repository file to mine. Also, I usually run bioinformatic tools from other directory (where I keep my results), not the one where all the information is, so I did
export PYTHONPATH="/path/where/I/cloned/viper/repo"
And it worked perfectly.
Now that I am analysing the results (I used the Diamond/Krona step to identify viruses in my sample) I saw that in the KRONA results directory, the HTML file contains each species identified and their abundances, and the .krona file has the samples and taxID's assigned to each.
My question is: Is there a way to obtain the number of reads assigned to each species identified? I need this value for my analysis. As always, thank you for your help.
Great that you found a solution, it seems that your system used a different python installation over the conda installed one.
To answer your second question:
You can combine the $sample.krona
and $sample.magnitudes
file with awk
awk 'NR==FNR { a[$1]=$2; next} $1 in a {print $0,"\t"a[$1]}' CONTIGS/$sample.magnitudes KRONA/$sample.krona > $sample.magnitudes.tab
This will give a tab separated file with the contig name as first column, the taxid as second, average log e-value as third and the number of mapped reads as the last column:
NODE_A2_length_632_cov_15.832432_AMV1 6072 -21.8560540343745 271
NODE_B1_length_20247_cov_955.712295_AMV1 1312874 -450 5842283
You can convert the taxid to a scientific name with a tool like taxonkit.
Thank you so much for the prompt response and the help provided!
Hello, I keep having this error when I try to run the pipeline:
As you can see, the file needed for the Diamond+Krona step (sample_length.contigs.fasta) is not being generated. Any help would be appreciated. Thank you.