flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

ls: cannot access spades_contigs_db/spades_contigs_db/00.input_data/annotation/J16_strain.contigs/*.gbk: No such file or directory #21

Closed sarah872 closed 4 years ago

sarah872 commented 4 years ago

I want to run pantagruel on 10 genomes. This is the directory structure of my input dir:

spades_contigs
├── annotation
│   ├── I15_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_I15.ecn
│   │   ├── mygenus_myspecies_I15.err
│   │   ├── mygenus_myspecies_I15.faa
│   │   ├── mygenus_myspecies_I15.ffn
│   │   ├── mygenus_myspecies_I15.fixedproducts
│   │   ├── mygenus_myspecies_I15.fna
│   │   ├── mygenus_myspecies_I15.fsa
│   │   ├── mygenus_myspecies_I15.gbk
│   │   ├── mygenus_myspecies_I15.gff
│   │   ├── mygenus_myspecies_I15.log
│   │   ├── mygenus_myspecies_I15.sqn
│   │   ├── mygenus_myspecies_I15.tbl
│   │   ├── mygenus_myspecies_I15.tsv
│   │   ├── mygenus_myspecies_I15.txt
│   │   └── mygenus_myspecies_I15.val
│   ├── J16_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_J16.ecn
│   │   ├── mygenus_myspecies_J16.err
│   │   ├── mygenus_myspecies_J16.faa
│   │   ├── mygenus_myspecies_J16.ffn
│   │   ├── mygenus_myspecies_J16.fixedproducts
│   │   ├── mygenus_myspecies_J16.fna
│   │   ├── mygenus_myspecies_J16.fsa
│   │   ├── mygenus_myspecies_J16.gbk
│   │   ├── mygenus_myspecies_J16.gff
│   │   ├── mygenus_myspecies_J16.log
│   │   ├── mygenus_myspecies_J16.sqn
│   │   ├── mygenus_myspecies_J16.tbl
│   │   ├── mygenus_myspecies_J16.tsv
│   │   ├── mygenus_myspecies_J16.txt
│   │   └── mygenus_myspecies_J16.val
│   ├── K16_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_K16.ecn
│   │   ├── mygenus_myspecies_K16.err
│   │   ├── mygenus_myspecies_K16.faa
│   │   ├── mygenus_myspecies_K16.ffn
│   │   ├── mygenus_myspecies_K16.fixedproducts
│   │   ├── mygenus_myspecies_K16.fna
│   │   ├── mygenus_myspecies_K16.fsa
│   │   ├── mygenus_myspecies_K16.gbk
│   │   ├── mygenus_myspecies_K16.gff
│   │   ├── mygenus_myspecies_K16.log
│   │   ├── mygenus_myspecies_K16.sqn
│   │   ├── mygenus_myspecies_K16.tbl
│   │   ├── mygenus_myspecies_K16.tsv
│   │   ├── mygenus_myspecies_K16.txt
│   │   └── mygenus_myspecies_K16.val
│   ├── L18_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_L18.ecn
│   │   ├── mygenus_myspecies_L18.err
│   │   ├── mygenus_myspecies_L18.faa
│   │   ├── mygenus_myspecies_L18.ffn
│   │   ├── mygenus_myspecies_L18.fixedproducts
│   │   ├── mygenus_myspecies_L18.fna
│   │   ├── mygenus_myspecies_L18.fsa
│   │   ├── mygenus_myspecies_L18.gbk
│   │   ├── mygenus_myspecies_L18.gff
│   │   ├── mygenus_myspecies_L18.log
│   │   ├── mygenus_myspecies_L18.sqn
│   │   ├── mygenus_myspecies_L18.tbl
│   │   ├── mygenus_myspecies_L18.tsv
│   │   ├── mygenus_myspecies_L18.txt
│   │   └── mygenus_myspecies_L18.val
│   ├── M13_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_M13.ecn
│   │   ├── mygenus_myspecies_M13.err
│   │   ├── mygenus_myspecies_M13.faa
│   │   ├── mygenus_myspecies_M13.ffn
│   │   ├── mygenus_myspecies_M13.fixedproducts
│   │   ├── mygenus_myspecies_M13.fna
│   │   ├── mygenus_myspecies_M13.fsa
│   │   ├── mygenus_myspecies_M13.gbk
│   │   ├── mygenus_myspecies_M13.gff
│   │   ├── mygenus_myspecies_M13.log
│   │   ├── mygenus_myspecies_M13.sqn
│   │   ├── mygenus_myspecies_M13.tbl
│   │   ├── mygenus_myspecies_M13.tsv
│   │   ├── mygenus_myspecies_M13.txt
│   │   └── mygenus_myspecies_M13.val
│   ├── M22_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_M22.err
│   │   ├── mygenus_myspecies_M22.faa
│   │   ├── mygenus_myspecies_M22.ffn
│   │   ├── mygenus_myspecies_M22.fixedproducts
│   │   ├── mygenus_myspecies_M22.fna
│   │   ├── mygenus_myspecies_M22.fsa
│   │   ├── mygenus_myspecies_M22.gbk
│   │   ├── mygenus_myspecies_M22.gff
│   │   ├── mygenus_myspecies_M22.log
│   │   ├── mygenus_myspecies_M22.sqn
│   │   ├── mygenus_myspecies_M22.tbl
│   │   ├── mygenus_myspecies_M22.tsv
│   │   ├── mygenus_myspecies_M22.txt
│   │   └── mygenus_myspecies_M22.val
│   ├── N06_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_N06.ecn
│   │   ├── mygenus_myspecies_N06.err
│   │   ├── mygenus_myspecies_N06.faa
│   │   ├── mygenus_myspecies_N06.ffn
│   │   ├── mygenus_myspecies_N06.fixedproducts
│   │   ├── mygenus_myspecies_N06.fna
│   │   ├── mygenus_myspecies_N06.fsa
│   │   ├── mygenus_myspecies_N06.gbk
│   │   ├── mygenus_myspecies_N06.gff
│   │   ├── mygenus_myspecies_N06.log
│   │   ├── mygenus_myspecies_N06.sqn
│   │   ├── mygenus_myspecies_N06.tbl
│   │   ├── mygenus_myspecies_N06.tsv
│   │   ├── mygenus_myspecies_N06.txt
│   │   └── mygenus_myspecies_N06.val
│   ├── N20_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_N20.ecn
│   │   ├── mygenus_myspecies_N20.err
│   │   ├── mygenus_myspecies_N20.faa
│   │   ├── mygenus_myspecies_N20.ffn
│   │   ├── mygenus_myspecies_N20.fixedproducts
│   │   ├── mygenus_myspecies_N20.fna
│   │   ├── mygenus_myspecies_N20.fsa
│   │   ├── mygenus_myspecies_N20.gbk
│   │   ├── mygenus_myspecies_N20.gff
│   │   ├── mygenus_myspecies_N20.log
│   │   ├── mygenus_myspecies_N20.sqn
│   │   ├── mygenus_myspecies_N20.tbl
│   │   ├── mygenus_myspecies_N20.tsv
│   │   ├── mygenus_myspecies_N20.txt
│   │   └── mygenus_myspecies_N20.val
│   ├── P15_strain
│   │   ├── errorsummary.val
│   │   ├── mygenus_myspecies_P15.err
│   │   ├── mygenus_myspecies_P15.faa
│   │   ├── mygenus_myspecies_P15.ffn
│   │   ├── mygenus_myspecies_P15.fixedproducts
│   │   ├── mygenus_myspecies_P15.fna
│   │   ├── mygenus_myspecies_P15.fsa
│   │   ├── mygenus_myspecies_P15.gbk
│   │   ├── mygenus_myspecies_P15.gff
│   │   ├── mygenus_myspecies_P15.log
│   │   ├── mygenus_myspecies_P15.sqn
│   │   ├── mygenus_myspecies_P15.tbl
│   │   ├── mygenus_myspecies_P15.tsv
│   │   ├── mygenus_myspecies_P15.txt
│   │   └── mygenus_myspecies_P15.val
│   └── P20_strain
│       ├── errorsummary.val
│       ├── mygenus_myspecies_P20.ecn
│       ├── mygenus_myspecies_P20.err
│       ├── mygenus_myspecies_P20.faa
│       ├── mygenus_myspecies_P20.ffn
│       ├── mygenus_myspecies_P20.fixedproducts
│       ├── mygenus_myspecies_P20.fna
│       ├── mygenus_myspecies_P20.fsa
│       ├── mygenus_myspecies_P20.gbk
│       ├── mygenus_myspecies_P20.gff
│       ├── mygenus_myspecies_P20.log
│       ├── mygenus_myspecies_P20.sqn
│       ├── mygenus_myspecies_P20.tbl
│       ├── mygenus_myspecies_P20.tsv
│       ├── mygenus_myspecies_P20.txt
│       └── mygenus_myspecies_P20.val
├── contigs
│   ├── I15_strain.contigs.fasta
│   ├── J16_strain.contigs.fasta
│   ├── K16_strain.contigs.fasta
│   ├── L18_strain.contigs.fasta
│   ├── M13_strain.contigs.fasta
│   ├── M22_strain.contigs.fasta
│   ├── N06_strain.contigs.fasta
│   ├── N20_strain.contigs.fasta
│   ├── P15_strain.contigs.fasta
│   └── P20_strain.contigs.fasta
└── strain_infos_spades_contigs_db.txt

I am running pantagruel like this:

pantagruel -d spades_contigs_db -r spades_contigs_db/ -a spades_contigs init
pantagruel -i spades_contigs_db/spades_contigs_db/environ_pantagruel_spades_contigs_db.sh all

The pipeline is starting, but after running prokka, there is an error:

[2019-09-19 09:39:50] extract assembly data from folder 'spades_contigs'
found 10 contig files (raw genome assemblies) in spades_contigs/contigs/
[2019-09-19 09:39:50] I15_strain.contigs
will annotate contigs in 'spades_contigs/contigs/I15_strain.contigs.fasta'
[2019-09-19 09:39:50]
### assembly: I15_strain.contigs; contig files from: spades_contigs/contigs/I15_strain.contigs.fasta
running Prokka...
done.
[2019-09-19 09:41:26]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
ls: cannot access spades_contigs_db/spades_contigs_db/00.input_data/annotation/I15_strain.contigs/*.gbk: No such file or directory
done.
flass commented 4 years ago

Hi Sarah,

I think the issue comes from the fact the your contain files are named differently from the corresponding annotation folders :

if you check the input data section of the README, you'll see an example.

Sorry I know the formatting is a bit tedious, but you almost got it right! I though to introduce non-exact string matching to ease this sort of cases, but thought better not because bacterial strain names are so random and often similar to each others, there could be a chance match with the wrong assembly.

I hope this helps.

best wishes,

Florent

sarah872 commented 4 years ago

Thanks a lot! Unfortunately, there are still some errors.

found 10 contig files (raw genome assemblies) in spades_contigs/contigs/
[2019-09-19 11:44:57] I15_strain.contigs
found annotation folder 'spades_contigs/annotation/I15_strain.contigs' ; skip annotation of contigs in 'spades_contigs/contigs/I15_strain.contigs.fasta'
ln: failed to create symbolic link ‘spades_contigs_db/spades_contigs_db/00.input_data/annotation/’: File exists
fix annotation to integrate region information into GFF files
ls: cannot access spades_contigs_db/spades_contigs_db/00.input_data/annotation/I15_strain.contigs/*.gff: No such file or directory
Error: missing mandatory GFF file in spades_contigs_db/spades_contigs_db/00.input_data/annotation/I15_strain.contigs/ folder
ERROR: Pantagrel pipeline task 0: failed.

I deleted the database folder spades_contigs_db and re-run the database generation, but without any effect.

flass commented 4 years ago

so this was a bug! it should be fixed with 2e98c27. you can update the git repository and rerun the task ; if you choose not to erase the whole database folder, please do not forget to run first:

pantagruel -i spades_contigs_db/spades_contigs_db/environ_pantagruel_spades_contigs_db.sh --refresh init
sarah872 commented 4 years ago

Still an error:

ls: cannot access db_name/00.input_data/annotation/I15_strain.contigs/*.gff: No such file or directory

it seems that the symbolic link is created wrong:

lrwxrwxrwx. 1 user login 23 Sep 19 15:42 db_name/00.input_data/annotation/I15_strain.contigs -> /I15_strain.contigs

the I15_strain.contigs is not in the root dir.

flass commented 4 years ago

loops.... sorry the fix was dirty there was a typo in the line ; corrected here 23eab0b.