SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
88 stars 29 forks source link

blastp failes #58

Closed jolindadekorne closed 3 years ago

jolindadekorne commented 3 years ago

Hi Sion,

I'm trying to run PIRATE on 766 bacterial genomes, for which I created gff files with prokka. PIRATE seems to be working fine: --check works and I have used the tool before on 380 genomes which worked as well. However, this time PIRATE quits at this step:

- 14114 representative loci passed to blast.

 - running all-vs-all BLASTP on pan_sequences

I'm getting the following error: - ERROR: pangenome_construction.pl failed - error logged at /path/to/output/directory/fail_test.txt

Fail_test.txt only contains: blastp failed.

Blastp is installed correctly and in $PATH so I'm not sure what the problem is here.

Do you have any idea what causes this error?

Thanks in advance, Jolinda

SionBayliss commented 3 years ago

Hi Jolinda,

That is a little odd. How did you install PIRATE? I would recommend conda. Do all of your files pass PIRATES fairly rudimentary QC? Could you send me the text that precedes the error message from your STDOUT?

All the best, Sion

jolindadekorne commented 3 years ago

Thanks for your quick response!

I installed it with brew because I had some troubles installing packages with conda..

It looks like all files passed the QC, here is the STDOUT text:

Job has started at Thu 01 Oct 2020 10:56:24 AM CEST
contains all 766 files, continue
Starting PIRATE at Thu 01 Oct 2020 10:56:29 AM CEST

-------------------------------

PIRATE input options:

 - Input Directory = /scratch/pirate
 - Output directory = /home/jdkorne/pirate_out
 - PIRATE will run using 15 cores
 - 766 files in input directory.
 - PIRATE will be run on 50,60,70,80,90,95,98 amino acid % identity thresholds.
 - PIRATE will be run on features annotated as CDS

-------------------------------

Standardising and checking input files:

 - 766 gff files passed QC and will be analysed by PIRATE - completed in: 12s

-------------------------------

 - creating co-ordinate files - completed in: 7s
 - creating genome loci list: - completed in: 3s

-------------------------------

Extracting pangenome sequences:

 - completed in: 70s

-------------------------------

Constructing pangenome sequences:

Options:

  - Creating pangenome on amino acid % identity using BLAST.
 - Creating pangenome on amino acid % identity using BLAST.
 - Input directory:     /home/jdkorne/pirate_out
 - Output directory:    /home/jdkorne/pirate_out/pangenome_iterations
 - Number of input files: 1
 - Threshold(s): 50 60 70 80 90 95 98
 - MCL inflation value: 1.5
 - Homology test cutoff: 1E-6
 - Loci file contains 1592885 loci from 766 genomes.
 - Extracting core loci during cdhit clustering
 - Opening pan_sequences
 - /home/jdkorne/pirate_out/pan_sequences.fasta contains 1580240 sequences.
 - Passing 1580241 loci to cd-hit at 100%
 - command: "cd-hit -i /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.temp.fasta -o /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.100 -aS 0.9 -c 1 -T 15 -g 1 -n 5 -M 2387 -d 256 >> /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.cdhit_log.txt"
 - Passing 1569517 loci to cd-hit at 99.5%
 - command: "cd-hit -i /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.temp.fasta -o /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.99.5 -aS 0.9 -c 0.995 -T 15 -g 1 -n 5 -M 2387 -d 256 >> /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.cdhit_log.txt"
 - Passing 1450787 loci to cd-hit at 99%
 - command: "cd-hit -i /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.temp.fasta -o /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.99 -aS 0.9 -c 0.99 -T 15 -g 1 -n 5 -M 2387 -d 256 >> /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.cdhit_log.txt"
 - Passing 1224051 loci to cd-hit at 98.5%
 - command: "cd-hit -i /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.temp.fasta -o /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.98.5 -aS 0.9 -c 0.985 -T 15 -g 1 -n 5 -M 2387 -d 256 >> /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.cdhit_log.txt"
 - Passing 1107619 loci to cd-hit at 98%
 - command: "cd-hit -i /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.temp.fasta -o /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.98 -aS 0.9 -c 0.98 -T 15 -g 1 -n 5 -M 2387 -d 256 >> /home/jdkorne/pirate_out/pangenome_iterations/pan_sequences.cdhit_log.txt"

 - 541562 core loci (34.2708485604411%)
 - 1038679 non-core loci (65.7291514395589%)

 - 14114 representative loci passed to blast.

 - running all-vs-all BLASTP on pan_sequences

And the STDERR text: - ERROR: pangenome_construction.pl failed - error logged at /path/to/output/directory/fail_test.txt

The problem might be caused by the brew installation.. I'll try again to install with conda, see if that fixes it.

Thanks again, Jolinda

SionBayliss commented 3 years ago

It looks like it isn't finding BLAST, so it might be a brew issue. Let me know if conda works and I can try and fix the brew recipe..

S

jolindadekorne commented 3 years ago

Hi!

I installed PIRATE again with conda, however I get this error now: - ERROR: extract_feature_sequences.pl failed

Preceding text:

Job has started at Wed 07 Oct 2020 12:41:42 PM CEST
contains all 766 files, continue
Starting PIRATE at Wed 07 Oct 2020 12:41:45 PM CEST

-------------------------------

PIRATE input options:

 - Input Directory = /scratch/pirate
 - Output directory = /home/jdkorne/project_oxford/pirate_dutch_subset_out
 - PIRATE will run using 15 cores
 - 766 files in input directory.
 - PIRATE will be run on 50,60,70,80,90,95,98 amino acid % identity thresholds.
 - PIRATE will be run on features annotated as CDS

-------------------------------

Standardising and checking input files:

 - 766 gff files passed QC and will be analysed by PIRATE - completed in: 12s

-------------------------------

 - creating co-ordinate files - completed in: 9s
 - creating genome loci list: - completed in: 2s

-------------------------------

Extracting pangenome sequences:

This time, I get the same error when running PIRATE --check. Do you think this is still related to the installation?

Thanks, Jolinda

SionBayliss commented 3 years ago

Hi Jolinda,

That is annoying. Due to the error with --check it looks like an issue with the installation. I have reinstalled and updated PIRATE using conda, but I can't replicate it on my system. It is failing on the a line which runs the script 'extract_feature_sequences.pl' which depends on BioPerl. Could you run this script on a single file and see if it completes? That should give us an idea about if it is a dependency issue.

Also, are you using Linux or Mac?

That you for the detailed reporting.

All the best, Sion

jolindadekorne commented 3 years ago

Hi Sion,

Thanks again for your quick response!

I'm working on Linux. It was indeed a dependency issue, caused by the brew installation of Bioperl. I realized I had to brew uninstall all the dependencies to make sure PIRATE searches for them in the conda environment. PIRATE works perfectly now :)

Thanks for the support! Jolinda

SionBayliss commented 3 years ago

Phew, glad to hear it!

I hope PIRATE is useful, Sion