Closed mrojas80 closed 1 year ago
It looks like one of you environmental variables ($HOME) hasn't been set. This isn't an issue with PIRATE but with your wider environment. A quick google of "$HOME not set." brings up some potential solutions and flags where this has impacted on other software.
How did you install PIRATE and what environment are you using?
All the best, S
You were correct it was a dependency issue! After adding a line to add $HOME to my .bashrc I was able to get all of the outputs when I ran on a small number of files (5). I ran it in a conda virtual environment with all of the dependencies and it now appears to work as intended. Thanks for your assistance!
No problem, glad I could help.
S
Hi, I am running PIRATE and am getting an issue where the core genome isn't being created and a few of the output files are missing. The run seems to have gone normally so I am not sure what happened. I checked the log file and it reads as follows:
PIRATE input options:
Standardising and checking input files:
Extracting pangenome sequences:
Constructing pangenome sequences:
Options:
Creating pangenome on amino acid % identity using DIAMOND.
Input directory: /home/mrojas80/Pantoea_102022/GFFs/PIRATE
Output directory: /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations
Number of input files: 1
Threshold(s): 50 60 70 80 90 95 98
MCL inflation value: 1.5
Homology test cutoff: 0.001
Loci file contains 1992482 loci from 436 genomes.
Extracting core loci during cdhit clustering
Opening pan_sequences
/home/mrojas80/Pantoea_102022/GFFs/PIRATE/pan_sequences.fasta contains 1948250 sequences.
Passing 1948251 loci to cd-hit at 100%
command: "cd-hit -i /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.temp.fasta -o /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.100 -aS 0.9 -c 1 -T 2 -g 1 -n 5 -M 3480 -d 256 >> /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.cdhit_log.txt"
Passing 1948251 loci to cd-hit at 99.5%
command: "cd-hit -i /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.temp.fasta -o /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.99.5 -aS 0.9 -c 0.995 -T 2 -g 1 -n 5 -M 3480 -d 256 >> /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.cdhit_log.txt"
Passing 1948251 loci to cd-hit at 99%
command: "cd-hit -i /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.temp.fasta -o /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.99 -aS 0.9 -c 0.99 -T 2 -g 1 -n 5 -M 3480 -d 256 >> /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.cdhit_log.txt"
Passing 1948251 loci to cd-hit at 98.5%
command: "cd-hit -i /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.temp.fasta -o /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.98.5 -aS 0.9 -c 0.985 -T 2 -g 1 -n 5 -M 3480 -d 256 >> /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.cdhit_log.txt"
Passing 1948251 loci to cd-hit at 98%
command: "cd-hit -i /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.temp.fasta -o /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.98 -aS 0.9 -c 0.98 -T 2 -g 1 -n 5 -M 3480 -d 256 >> /home/mrojas80/Pantoea_102022/GFFs/PIRATE/pangenome_iterations/pan_sequences.cdhit_log.txt"
completed in 1263 secs
0 core loci (0%)
1948251 non-core loci (100%)
306264 representative loci passed to blast.
running all-vs-all DIAMOND on pan_sequences
completed in 11803 secs
running mcl on pan_sequences at 50
33947 clusters at 50 % - completed in 307 secs
running mcl on pan_sequences at 60
41426 clusters at 60 % - completed in 17103 secs
running mcl on pan_sequences at 70
51048 clusters at 70 % - completed in 13369 secs
running mcl on pan_sequences at 80
68152 clusters at 80 % - completed in 9695 secs
running mcl on pan_sequences at 90
111427 clusters at 90 % - completed in 5814 secs
running mcl on pan_sequences at 95
170462 clusters at 95 % - completed in 5014 secs
running mcl on pan_sequences at 98
283919 clusters at 98 % - completed in 5853 secs
reinflating clusters for pan_sequences
Finished
I found this error in the slurm log (I can add that as well if you need it):
I then checked the fail_test.txt file:
parallel: Warning: $HOME not set. Using /tmp. parallel: Warning: $HOME not set. Using /tmp. parallel: Warning: $HOME not set. Using /tmp. parallel: Warning: $HOME not set. Using /tmp. parallel: Warning: $HOME not set. Using /tmp. parallel: Warning: $HOME not set. Using /tmp.
The above is all that is written in the file. I don't think this would impact the pangenome construction, so I am not sure what went wrong. I do not get the desired core genome fasta file and other output files.
Any assistance would be greatly appreciated.