SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
90 stars 29 forks source link

Error in core genome alignment #61

Closed furqan915 closed 3 years ago

furqan915 commented 3 years ago

Hello, I must mention that I am working with 557 genomes on Linux Ubuntu. I am trying to perform pan-genome analysis and core genome alignment. But Pirate doesnot detect the mafft package. log file shows this message and kills the core genome alignment.


Summary of pangenome clusters:

# 5162 gene families in 557 genomes.
# 697 contain greater than one allele at the thresholds analysed.
# 1323 contain fission/fusion events.
# 545 contain duplication/loss.

%isolates   #clusters   >1 allele   fission/fusion  multicopy
0-10%   2241    193 171 41
10-25%  311 160 84  47
25-50%  218 143 103 73
50-75%  149 96  63  47
75-90%  70  26  42  22
90-95%  16  7   11  7
95-100% 2157    72  849 308

-------------------------------

Aligning all feature sequences:
 - number of groups : 5012
 - extracting sequences from gffs

 - ERROR: aligning pangenome sequences failed - is mafft in PATH?
 - completed in: 84s

-------------------------------

PIRATE completed in 5491s

I have installed mafft via conda and it is present in ~/miniconda/bin/. Further this path is included in .bashrc and I have checked it via calling mafft package.

I must mention mafft (with lower case) is installed and not the MAFFT (not the upper call). Possibly, pirate recognizes MAFFT and not the mafft.

Please guide me in this regard.

SionBayliss commented 3 years ago

PIRATE detects mafft (lowercase). If you have installed it via conda then you will need to ensure the environment is activated otherwise it will not be in path. If you installed it manually ensure it can be run from your command line. If you are using conda I would recommend installing PIRATE using conda as this will install all dependencies. This can be performed using conda install pirate (with relevant bioconda channels). See the README for more details.

All the best, Sion

furqan915 commented 3 years ago

Hi, Thanks for the quick response. I have created separate conda environment for pirate using this command. conda create -n pirate -c bioconda -c defaults -c conda-forge pirate I have checked mafft by calling via command line. Its working fine.

But its not responding in pirate. The same problem persisted and the very same error shows up everytime.

SionBayliss commented 3 years ago

Could you run the alignment script directly. It maybe throwing a separate error that isn't associated with the mafft dependency. From inside the PIRATE directory run:

perl PIRATE_SCRIPT_PATH/align_feature_sequences.pl --dosage 1.25 -i ./PIRATE.gene_families.ordered.tsv -g ./modified_gffs/ ./feature_sequences/ -p YOUR_NUMBER_OF_THREADS

Could you post the STDOUT from that script please?

All the best, Sion

furqan915 commented 3 years ago

Hello, Thank you for your guidance. I have checked the command you have mentioned and you were right. There was no problem with the mafft detection but there was some issue with the title of my gff file. Thanks again.

SionBayliss commented 3 years ago

Great! Glad I could help.

S