flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

Task 0 failed #18

Closed Sebastien-Raguideau closed 4 years ago

Sebastien-Raguideau commented 5 years ago

Hello,

I'm running into an issue trying to go through the Test.

So at the end of prokka, some .gbk and ;gff files are missing, and a bug ensue when trying to rebuild them from gff. And that is because the .gff does not have any CDS. pantagruel_test.log

Something goes wrong when calling prokka, here are the logs from prokka : testPTGdatabase_customassembly_annot_prokka.FOQJ01.log Prokka runs in about 3 seconds and does not find any features, for instance no CDS. Here is the output folder for annotation : FOQJ01.tar.gz

Though when launching prokka outside of pantagruel, using the same command line as in the log, and in the same conda env, prokka works out fine....

Do you have any idea about what may happens?

flass commented 5 years ago

I must say I find this very weird... Not only prodigal does not find any CDS, but neither does aragorn find any tRNA, nor barnap finds any rRNA. You say you replicated the command from the log outside of Pantagruel and it worked? Which command exactly? the one from the prokka log? :

/home/sebr/.linuxbrew/bin/prokka --outdir /mnt/gpfs/seb/Applications/testPTGdatabase/00.input_data/annotation/FOQJ01 --prefix Bradyrhizobium_sp_cf659 --force --addgenes --locustag BRAD659 --compliant --centre somewhere --genus Bradyrhizobium --species sp --strain 'cf659' --kingdom Bacteria --gcode 11 /mnt/gpfs/seb/Applications/pantagruel_pipeline/pantagruel/data/custom_genomes/contigs/FOQJ01.fasta

the input and executable file paths are all fully defined, so it should not be different running this command within or outside Pantagruel...The only possible difference I can think of is the protein database used for predicted protein functional annotation search, as it is dynamically edited before the execution of prokka in the script run_prokka.sh (and restored afterwards). But this should not affect the calls to prodigal, barnap and aragorn prior to it... I'll try and investigate.

flass commented 5 years ago

I made a small fix of the behaviour of run_prokka.sh in commit 117889a regarding the dynamic editing of the BLAST db above, but again I don't expect this would have any impact. A recent try from scratch on my side worked fine... Can you please try again with the latest code? on a clean folder, i.e. having run:

# use sudo to remove those annoying assembly structure files that are owned by root ; they should not be downloaded by current versions
sudo rm -rf 00.input_data/ 
 cd /your/path/to/software/pantagruel/ ; git pull ; cd -
flass commented 4 years ago

Hi @Sebastien-Raguideau I think I may finally have got the bug that was causing you trouble... not sure it really matches your description, but fixes introduced in 79dad43 and further may be the ones. The problem there was that when the script add_region_feature2prokkaGFF.py sometimes failed when tried and match the contains from the original input contain file with the genomic regions in the Prokka-generated GFF file (regions that match the contains in the Prokka-generated genomic fasta .fna file). The reason is because Prokka discards all the contains of length < 200 bp, and also possibly reorders them; the matching ignored that and just tried to naively pair them as they came. They are now nicely matched. Beyond that the issue was that the add_region_feature2prokkaGFF.py script stopped in the middle of the GFF transcription and the error was not being caught by the parent pipeline script, which proceeded leaving the unfinished GFF unnoticed...

all that to say that this is all fixed, with sanity checks introduced. You can have a try see if your problem persists - I hope not!

Best wishes, Florent

flass commented 4 years ago

I suggest you now use the Dockerfile to build a docker image, which should fix this kind of runtime problem, see #11 .