GunzIvan28 / rMAP

Bacterial analysis toolbox for full ESKAPE pathogen characterization and profiling the resistome, mobilome, virulome & phylogenomics using WGS
GNU General Public License v3.0
20 stars 17 forks source link

Multiple results file are empty #2

Closed safinaARK closed 2 years ago

safinaARK commented 3 years ago

Dear rMAP Team,

I have ran rMAP on Shigella flexneri raw sequenceing using the following command:

rMAP -i /mnt/e/Working/Shigella/Data/shigella_flexneri/africa/ -o rMAP_output_africa_flex -t 2 -m -p -s -g -q -a shovill -vc

I have the following folders generated:

Assembly summary statistics - ran perfectly SNP-Variant Calling No results generated Phylogenetic inference As no snps were generated so no phylogeny was done Antimicrobial Resistance Profiling perfectly ran Plasmid Profiling perfectly ran Virulence Factor Determination perfectly ran Multi-Locus Sequence Typing (MLST) perfectly ran Pangenome Analysis perfectly ran Insertion sequence characterization (IS) It does not generate the summary file correctly as it only give the name of insertion sequences found but does not tell which sample has which insertion sequences and whats the percent identity as well when this step was processing the screen prompt the following error:

cat: 'rMAP_output_africa_flex/insertion_sequences/ERR573382/ERR573382.clean/ISKpn23/*.txt': No such file or directory same error for all the samples

as well the same following error for all the samples:

Processing sample: ERR126963 Traceback (most recent call last): File "/home/sar/miniconda3/envs/rMAP-1.0/bin/ismap", line 7, in from ismap import main File "/home/sar/miniconda3/envs/rMAP-1.0/bin/ismap.py", line 12, in from mapping_to_query import map_to_is_query File "/home/sar/miniconda3/envs/rMAP-1.0/bin/mapping_to_query.py", line 6, in from Bio.Alphabet import generic_dna File "/home/sar/anaconda3/envs/rMAP-1.0/lib/python3.7/site-packages/Bio/Alphabet/init.py", line 21, in "Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information." ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

The other error found was:

WARNING! This alignment consists of closely-related and very-long sequences. WARNING! FastTree (or other standard maximum-likelihood tools) may not be appropriate for aligments of very closely-related sequences like this one, as FastTree does not account for recombination or gene conversion

And the html report was not generated because the rmarkdown

packageError in library(rmarkdown) : there is no package called 'rmarkdown' was not found:

How to resolve these errors please help.

Thank you in advance

SAR

GunzIvan28 commented 3 years ago

Thanks @safinaARK for raising this very detailed error report. There are quite a number of possible causes for the program blowing up or running partially and am going to try and run through each one at a time so that we try to fix them sequentially in the rMAP workflow:

  1. First, let me start with your rMAP script: rMAP -i /mnt/e/Working/Shigella/Data/shigella_flexneri/africa/ -o rMAP_output_africa_flex -t 2 -m -p -s -g -q -a shovill -vc I noticed you missed out the -r or --reference option which basically controls the output from many subsequent steps downstream. The reference in form of a .fasta, .fa or .gbk is required for:

SNP-Variant Calling - raw sequences have to be compared against a reference for variants/snps to be called.

Phylogenetic inference - each individual sample vcf is used to form one combined vcf containing snps that are transposed and converted to multifasta sequences which are then aligned and used to infer phylogeny. In otherwords, without outputs from the first SNP-Variant Calling, don't expect any phylogenetics anywhere.

  1. Antimicrobial Resistance , Plasmid Profiling, Virulence Factor Determination, Multi-Locus Sequence Typing (MLST), Pangenome Analysis - All these will run because they use the contigs and scaffolds obtained from the de novo assembly using shovill. They are not dependent on the reference genome.

  2. Insertion sequence characterization (IS) - For this one, kindly make sure you thoroughly went through the full installation instructions for rMAP. From what I can deduce, your installation might have skipped the importation of the insertion sequence database into the software directory. Kindly refer to the snippet below and confirm this was done:

    
    conda activate rMAP-1.0
    bash setup.sh
    cd && bash clean.sh
    rm -rf clean.sh
    rMAP -h
The above should fix the error `cat: 'rMAP_output_africa_flex/insertion_sequences/ERR573382/ERR573382.clean/ISKpn23/*.txt': No such file or directory same error for all the samples` as well.  

4. For error:  
``` Processing sample: ERR126963
Traceback (most recent call last):
File "/home/sar/miniconda3/envs/rMAP-1.0/bin/ismap", line 7, in
from ismap import main
File "/home/sar/miniconda3/envs/rMAP-1.0/bin/ismap.py", line 12, in
from mapping_to_query import map_to_is_query
File "/home/sar/miniconda3/envs/rMAP-1.0/bin/mapping_to_query.py", line 6, in
from Bio.Alphabet import generic_dna
File "/home/sar/anaconda3/envs/rMAP-1.0/lib/python3.7/site-packages/Bio/Alphabet/init.py", line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information. 

Make sure you ran this command:
rMAP -t 8 --config or rMAP -t 8 -c the very first time before you launch rMAP. This will install the additional packages that could not be compiled within the conda environment initially because of incompatibility and version issues. If that is done, the rmarkdown error below:
packageError in library(rmarkdown) : there is no package called 'rmarkdown' was not found should as well get fixed because the markdown plugin is used to generate the final html report and is only installed in using the --config option. Take note however that --config is only run once and you must be connected to internet of course. When it is successfully executed, you will get a green message from the pipeline telling you everything is fully set up.

Please feel free to let us know if you experience any other challenges. Enjoy the pipeline, cheers.

Ivan