flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

task 3: create sqldb: KeyError: 'locus_tag_prefix' #28

Closed megaptera-helvetiae closed 4 years ago

megaptera-helvetiae commented 4 years ago

Hitting my next wall.

Tasks 0, 1, and 2 were error free. Task 3 completed with errors.

Output attached below.

[wilkins@gorilla Panta]$ /scratch/clamchatka/Panta/pantagruel/pantagruel -i /scratch/clamchatka/Panta/test9/environ_pantagruel_test9.sh fetch
This is Pantagruel pipeline version 4867c048788ba7ec92dfd5ae9148d0349411151c using source code from repository '/scratch/clamchatka/Panta/pantagruel'
# will run tasks: 0
[2019-11-18 18:38:48] Pantagruel pipeline task 0: fetch public genome data from NCBI sequence databases and annotate private genomes.
Create new task folder '/scratch/clamchatka/Panta/test9/00.input_data'
[2019-11-18 18:38:48] extract assembly data from folder '/scratch/clamchatka/Panta/user_genomes'
found 14 contig files (raw genome assemblies) in /scratch/clamchatka/Panta/user_genomes/contigs/
[2019-11-18 18:38:48] Ctena_galapagana_StHelenaBay_001_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_galapagana_StHelenaBay_001_SYM.fasta'
[2019-11-18 18:38:48]
### assembly: Ctena_galapagana_StHelenaBay_001_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_galapagana_StHelenaBay_001_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_galapagana_StHelenaBay_001_SYM'
[2019-11-18 18:40:25]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_galapagana_StHelenaBay_001_SYM/Ctena_mexicana_Hele001.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:40 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001/
-rw-r--r--. 1 wilkins login 1496073 Nov 18 18:40 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001_genomic.gff.gz
-rw-r--r--. 1 wilkins login 2944589 Nov 18 18:40 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 778366 Nov 18 18:40 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001_protein.faa.gz
-rw-r--r--. 1 wilkins login 1270508 Nov 18 18:40 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1197358 Nov 18 18:40 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001_cds_from_genomic.fna.gz
[2019-11-18 18:40:36] Ctena_imbricatula_STRI_051_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_051_SYM.fasta'
[2019-11-18 18:40:36]
### assembly: Ctena_imbricatula_STRI_051_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_051_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_051_SYM'
[2019-11-18 18:42:18]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_051_SYM/Ctena_mexicana_STRI051.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:42 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051/
-rw-r--r--. 1 wilkins login 1612058 Nov 18 18:42 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3154223 Nov 18 18:42 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 845856 Nov 18 18:42 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051_protein.faa.gz
-rw-r--r--. 1 wilkins login 1373688 Nov 18 18:42 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1284365 Nov 18 18:42 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051_cds_from_genomic.fna.gz
[2019-11-18 18:42:29] Ctena_imbricatula_STRI_052_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_052_SYM.fasta'
[2019-11-18 18:42:29]
### assembly: Ctena_imbricatula_STRI_052_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_052_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_052_SYM'
[2019-11-18 18:44:18]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_052_SYM/Ctena_mexicana_STRI052.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:44 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052/
-rw-r--r--. 1 wilkins login 1678658 Nov 18 18:44 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3080722 Nov 18 18:44 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 883676 Nov 18 18:44 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052_protein.faa.gz
-rw-r--r--. 1 wilkins login 1432779 Nov 18 18:44 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1343813 Nov 18 18:44 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052_cds_from_genomic.fna.gz
[2019-11-18 18:44:29] Ctena_imbricatula_STRI_065_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_065_SYM.fasta'
[2019-11-18 18:44:29]
### assembly: Ctena_imbricatula_STRI_065_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_065_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_065_SYM'
[2019-11-18 18:46:17]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_065_SYM/Ctena_mexicana_STRI065.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:46 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065/
-rw-r--r--. 1 wilkins login 1689761 Nov 18 18:46 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3141586 Nov 18 18:46 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 888107 Nov 18 18:46 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065_protein.faa.gz
-rw-r--r--. 1 wilkins login 1444080 Nov 18 18:46 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1349636 Nov 18 18:46 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065_cds_from_genomic.fna.gz
[2019-11-18 18:46:28] Ctena_imbricatula_STRI_068_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_068_SYM.fasta'
[2019-11-18 18:46:28]
### assembly: Ctena_imbricatula_STRI_068_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_068_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_068_SYM'
[2019-11-18 18:48:10]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_068_SYM/Ctena_mexicana_STRI068.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:48 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068/
-rw-r--r--. 1 wilkins login 1609564 Nov 18 18:48 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3177524 Nov 18 18:48 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 834711 Nov 18 18:48 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068_protein.faa.gz
-rw-r--r--. 1 wilkins login 1370267 Nov 18 18:48 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1267210 Nov 18 18:48 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068_cds_from_genomic.fna.gz
[2019-11-18 18:48:21] Ctena_imbricatula_STRI_070_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_070_SYM.fasta'
[2019-11-18 18:48:21]
### assembly: Ctena_imbricatula_STRI_070_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_070_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_070_SYM'
[2019-11-18 18:50:03]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_070_SYM/Ctena_mexicana_STRI070.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:50 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070/
-rw-r--r--. 1 wilkins login 1585007 Nov 18 18:50 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3108467 Nov 18 18:50 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 829928 Nov 18 18:50 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070_protein.faa.gz
-rw-r--r--. 1 wilkins login 1349007 Nov 18 18:50 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1258998 Nov 18 18:50 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070_cds_from_genomic.fna.gz
[2019-11-18 18:50:14] Ctena_imbricatula_STRI_073_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_073_SYM.fasta'
[2019-11-18 18:50:14]
### assembly: Ctena_imbricatula_STRI_073_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_073_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_073_SYM'
[2019-11-18 18:51:57]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_073_SYM/Ctena_mexicana_STRI073.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:51 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073/
-rw-r--r--. 1 wilkins login 1623141 Nov 18 18:51 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3140078 Nov 18 18:52 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 848832 Nov 18 18:52 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073_protein.faa.gz
-rw-r--r--. 1 wilkins login 1381490 Nov 18 18:52 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1288478 Nov 18 18:52 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073_cds_from_genomic.fna.gz
[2019-11-18 18:52:08] Ctena_imbricatula_STRI_074_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_074_SYM.fasta'
[2019-11-18 18:52:08]
### assembly: Ctena_imbricatula_STRI_074_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_074_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_074_SYM'
[2019-11-18 18:53:50]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_074_SYM/Ctena_mexicana_STRI074.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:53 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074/
-rw-r--r--. 1 wilkins login 1618544 Nov 18 18:53 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3106887 Nov 18 18:53 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 851061 Nov 18 18:53 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074_protein.faa.gz
-rw-r--r--. 1 wilkins login 1380701 Nov 18 18:53 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1292936 Nov 18 18:54 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074_cds_from_genomic.fna.gz
[2019-11-18 18:54:01] Ctena_imbricatula_STRI_094_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_094_SYM.fasta'
[2019-11-18 18:54:01]
### assembly: Ctena_imbricatula_STRI_094_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_imbricatula_STRI_094_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_imbricatula_STRI_094_SYM'
[2019-11-18 18:55:48]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_imbricatula_STRI_094_SYM/Ctena_mexicana_STRI094.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:55 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094/
-rw-r--r--. 1 wilkins login 1674615 Nov 18 18:55 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3091280 Nov 18 18:55 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 881990 Nov 18 18:55 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094_protein.faa.gz
-rw-r--r--. 1 wilkins login 1430334 Nov 18 18:55 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1341101 Nov 18 18:55 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094_cds_from_genomic.fna.gz
[2019-11-18 18:55:59] Ctena_mexicana_StHelenaBay_011_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_011_SYM.fasta'
[2019-11-18 18:55:59]
### assembly: Ctena_mexicana_StHelenaBay_011_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_011_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_mexicana_StHelenaBay_011_SYM'
[2019-11-18 18:57:37]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_mexicana_StHelenaBay_011_SYM/Ctena_mexicana_Hele011.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:57 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011/
-rw-r--r--. 1 wilkins login 1570866 Nov 18 18:57 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3011371 Nov 18 18:57 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 821538 Nov 18 18:57 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011_protein.faa.gz
-rw-r--r--. 1 wilkins login 1338223 Nov 18 18:57 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1263245 Nov 18 18:57 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011_cds_from_genomic.fna.gz
[2019-11-18 18:57:47] Ctena_mexicana_StHelenaBay_012_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_012_SYM.fasta'
[2019-11-18 18:57:47]
### assembly: Ctena_mexicana_StHelenaBay_012_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_012_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_mexicana_StHelenaBay_012_SYM'
[2019-11-18 18:59:20]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_mexicana_StHelenaBay_012_SYM/Ctena_mexicana_Hele012.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 18:59 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012/
-rw-r--r--. 1 wilkins login 1506431 Nov 18 18:59 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012_genomic.gff.gz
-rw-r--r--. 1 wilkins login 2962222 Nov 18 18:59 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 785249 Nov 18 18:59 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012_protein.faa.gz
-rw-r--r--. 1 wilkins login 1280292 Nov 18 18:59 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1207164 Nov 18 18:59 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012_cds_from_genomic.fna.gz
[2019-11-18 18:59:30] Ctena_mexicana_StHelenaBay_013_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_013_SYM.fasta'
[2019-11-18 18:59:30]
### assembly: Ctena_mexicana_StHelenaBay_013_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_013_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_mexicana_StHelenaBay_013_SYM'
[2019-11-18 19:01:29]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_mexicana_StHelenaBay_013_SYM/Ctena_mexicana_Hele013.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 19:01 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013/
-rw-r--r--. 1 wilkins login 1813700 Nov 18 19:01 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3518852 Nov 18 19:01 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 936815 Nov 18 19:01 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013_protein.faa.gz
-rw-r--r--. 1 wilkins login 1549406 Nov 18 19:01 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1442678 Nov 18 19:01 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013_cds_from_genomic.fna.gz
[2019-11-18 19:01:41] Ctena_mexicana_StHelenaBay_014_SYM2
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_014_SYM2.fasta'
[2019-11-18 19:01:41]
### assembly: Ctena_mexicana_StHelenaBay_014_SYM2; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_014_SYM2.fasta
running Prokka...
succesfully annotated genome 'Ctena_mexicana_StHelenaBay_014_SYM2'
[2019-11-18 19:03:30]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_mexicana_StHelenaBay_014_SYM2/Ctena_mexicana_Hele014b.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 19:03 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b/
-rw-r--r--. 1 wilkins login 1778106 Nov 18 19:03 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b_genomic.gff.gz
-rw-r--r--. 1 wilkins login 3479327 Nov 18 19:03 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 868818 Nov 18 19:03 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b_protein.faa.gz
-rw-r--r--. 1 wilkins login 1500730 Nov 18 19:03 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1342478 Nov 18 19:03 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b_cds_from_genomic.fna.gz
[2019-11-18 19:03:41] Ctena_mexicana_StHelenaBay_014_SYM
will annotate contigs in '/scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_014_SYM.fasta'
[2019-11-18 19:03:41]
### assembly: Ctena_mexicana_StHelenaBay_014_SYM; contig files from: /scratch/clamchatka/Panta/user_genomes/contigs/Ctena_mexicana_StHelenaBay_014_SYM.fasta
running Prokka...
succesfully annotated genome 'Ctena_mexicana_StHelenaBay_014_SYM'
[2019-11-18 19:05:11]
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files
succesfully modified the GenBank flat file /scratch/clamchatka/Panta/test9/00.input_data/annotation/Ctena_mexicana_StHelenaBay_014_SYM/Ctena_mexicana_Hele014.gbf
will create GenBank-like assembly folder for user-provided genomes
drwxr-sr-x. 2 wilkins login 0 Nov 18 19:05 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014/
-rw-r--r--. 1 wilkins login 1358020 Nov 18 19:05 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014_genomic.gff.gz
-rw-r--r--. 1 wilkins login 2683614 Nov 18 19:05 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014_genomic.gbff.gz
-rw-r--r--. 1 wilkins login 677008 Nov 18 19:05 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014_protein.faa.gz
-rw-r--r--. 1 wilkins login 1152189 Nov 18 19:05 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014_genomic.fna.gz
-rw-r--r--. 1 wilkins login 1041181 Nov 18 19:05 /scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014_cds_from_genomic.fna.gz
parsing genome annotation from genBank flat files...
Ctena_galapagana_StHelenaBay_001_SYM.1 Ctena_imbricatula_STRI_051_SYM.1 Ctena_imbricatula_STRI_052_SYM.1 Ctena_imbricatula_STRI_065_SYM.1 Ctena_imbricatula_STRI_068_SYM.1 Ctena_imbricatula_STRI_070_SYM.1 Ctena_imbricatula_STRI_073_SYM.1 Ctena_imbricatula_STRI_074_SYM.1 Ctena_imbricatula_STRI_094_SYM.1 Ctena_mexicana_StHelenaBay_011_SYM.1 Ctena_mexicana_StHelenaBay_012_SYM.1 Ctena_mexicana_StHelenaBay_013_SYM.1 Ctena_mexicana_StHelenaBay_014_SYM.1 Ctena_mexicana_StHelenaBay_014_SYM2.1  ...done
Ctena_galapagana_StHelenaBay_001_SYM.1
Ctena_galapagana_StHelenaBay_001_SYM.1; Ctena mexicana; "Hele001"; ; ; 
Ctena_imbricatula_STRI_051_SYM.1
Ctena_imbricatula_STRI_051_SYM.1; Ctena mexicana; "STRI051"; ; ; 
Ctena_imbricatula_STRI_052_SYM.1
Ctena_imbricatula_STRI_052_SYM.1; Ctena mexicana; "STRI052"; ; ; 
Ctena_imbricatula_STRI_065_SYM.1
Ctena_imbricatula_STRI_065_SYM.1; Ctena mexicana; "STRI065"; ; ; 
Ctena_imbricatula_STRI_068_SYM.1
Ctena_imbricatula_STRI_068_SYM.1; Ctena mexicana; "STRI068"; ; ; 
Ctena_imbricatula_STRI_070_SYM.1
Ctena_imbricatula_STRI_070_SYM.1; Ctena mexicana; "STRI070"; ; ; 
Ctena_imbricatula_STRI_073_SYM.1
Ctena_imbricatula_STRI_073_SYM.1; Ctena mexicana; "STRI073"; ; ; 
Ctena_imbricatula_STRI_074_SYM.1
Ctena_imbricatula_STRI_074_SYM.1; Ctena mexicana; "STRI074"; ; ; 
Ctena_imbricatula_STRI_094_SYM.1
Ctena_imbricatula_STRI_094_SYM.1; Ctena mexicana; "STRI094"; ; ; 
Ctena_mexicana_StHelenaBay_011_SYM.1
Ctena_mexicana_StHelenaBay_011_SYM.1; Ctena mexicana; "Hele011"; ; ; 
Ctena_mexicana_StHelenaBay_012_SYM.1
Ctena_mexicana_StHelenaBay_012_SYM.1; Ctena mexicana; "Hele012"; ; ; 
Ctena_mexicana_StHelenaBay_013_SYM.1
Ctena_mexicana_StHelenaBay_013_SYM.1; Ctena mexicana; "Hele013"; ; ; 
Ctena_mexicana_StHelenaBay_014_SYM.1
Ctena_mexicana_StHelenaBay_014_SYM.1; Ctena mexicana; "Hele014"; ; ; 
Ctena_mexicana_StHelenaBay_014_SYM2.1
Ctena_mexicana_StHelenaBay_014_SYM2.1; Ctena mexicana; "Hele014b"; ; ; 
[2019-11-18 19:05:22]
Pantagruel pipeline task 0: complete.
[wilkins@gorilla Panta]$ /scratch/clamchatka/Panta/pantagruel/pantagruel -i /scratch/clamchatka/Panta/test9/environ_pantagruel_test9.sh homologous
This is Pantagruel pipeline version 4867c048788ba7ec92dfd5ae9148d0349411151c using source code from repository '/scratch/clamchatka/Panta/pantagruel'
# will run tasks: 1
[2019-11-18 19:20:01] Pantagruel pipeline task 1: classify protein sequences into homologous families.
Create new task folder '/scratch/clamchatka/Panta/test9/01.seqdb'
[2019-11-18 19:20:03] -- 14 proteomes in dataset
[2019-11-18 19:20:03] -- 56061 proteins in dataset
[2019-11-18 19:20:04] -- 56061 non-redundant protein ids in dataset
                      -- Perform first protein clustering step (100% prot identity clustering with clusthash algorithm)
                      -- First protein clustering step complete: 
Writing results 0h 0m 0s 23ms
Time for merging to all_proteomes.clusthashdb_minseqid100_clust: 0h 0m 0s 20ms
nfin = '/scratch/clamchatka/Panta/test9/01.seqdb/all_proteomes.clusthashdb_minseqid100_clusters' ; famprefix = 'NRPROT' ; dirout = '/scratch/clamchatka/Panta/test9/01.seqdb/all_proteomes.clusthashdb_minseqid100_families' ; padlen = 6 ; writeseq = False ; discardsingle = False
listed 34956 redundant sequences in dataset
generated hash index
parsing redundant sequence fasta
filtered 21105 non-redundant sequences
parse NCBI Taxonomy merged taxon ids from '/scratch/clamchatka/Panta/NCBI/Taxonomy_2019-11-12/merged.dmp'
parse NCBI Taxonomy taxon names from '/scratch/clamchatka/Panta/NCBI/Taxonomy_2019-11-12/names.dmp'
parse redundant protein names from '/scratch/clamchatka/Panta/test9/01.seqdb/all_proteomes.identicals.list'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_galapagana_StHelenaBay_001_SYM.1_Ctena_mexicana_Hele001'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_051_SYM.1_Ctena_mexicana_STRI051'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_052_SYM.1_Ctena_mexicana_STRI052'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_065_SYM.1_Ctena_mexicana_STRI065'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_068_SYM.1_Ctena_mexicana_STRI068'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_070_SYM.1_Ctena_mexicana_STRI070'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_073_SYM.1_Ctena_mexicana_STRI073'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_074_SYM.1_Ctena_mexicana_STRI074'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_imbricatula_STRI_094_SYM.1_Ctena_mexicana_STRI094'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_mexicana_StHelenaBay_011_SYM.1_Ctena_mexicana_Hele011'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_mexicana_StHelenaBay_012_SYM.1_Ctena_mexicana_Hele012'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_mexicana_StHelenaBay_013_SYM.1_Ctena_mexicana_Hele013'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_mexicana_StHelenaBay_014_SYM.1_Ctena_mexicana_Hele014'
parse assembly '/scratch/clamchatka/Panta/test9/00.input_data/assemblies/Ctena_mexicana_StHelenaBay_014_SYM2.1_Ctena_mexicana_Hele014b'
                      -- Perform second protein clustering step (to find homologs with cluster algorithm)
                      -- Second protein clustering step complete: 
nfin = '/scratch/clamchatka/Panta/test9/01.seqdb/protein_families/all_proteomes.nr.mmseqs_clusterdb_default_clusters' ; famprefix = 'PANTAGP' ; dirout = '/scratch/clamchatka/Panta/test9/01.seqdb/protein_families/all_proteomes.nr.mmseqs_clusterdb_default_clusters_fasta' ; padlen = 6 ; writeseq = True ; discardsingle = False
                      -- Successfully split mmseqs cluster '/scratch/clamchatka/Panta/test9/01.seqdb/protein_families/all_proteomes.nr.mmseqs_clusterdb_default_clusters'
[2019-11-18 19:20:57] -- 21105 non-redundant proteins
[2019-11-18 19:20:57] -- classified into 3999 clusters
                      -- including artificial cluster PANTAGP000000 gathering 2753 ORFan nr proteins
                      -- (NB: some are not true ORFans as can be be present as identical sequences in several genomes)
Pantagruel pipeline task 1: complete.
[wilkins@gorilla Panta]$ /scratch/clamchatka/Panta/pantagruel/pantagruel -i /scratch/clamchatka/Panta/test9/environ_pantagruel_test9.sh align
This is Pantagruel pipeline version 4867c048788ba7ec92dfd5ae9148d0349411151c using source code from repository '/scratch/clamchatka/Panta/pantagruel'
# will run tasks: 2
[2019-11-18 19:35:28] Pantagruel pipeline task 2: align homologous protein sequences and translate alignemnts into coding sequences.
Create new task folder '/scratch/clamchatka/Panta/test9/02.gene_alignments'
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

parse redundant protein names from '/scratch/clamchatka/Panta/test9/01.seqdb/all_proteomes.identicals.list'
# parse replicon/genome assembly data
# map genome assembly / CDS sequence dump files
# parse protein / CDS data
  last registered family id: 3998
# parse singleton nr protein sequences
  4612 final non-ORFan CDS families, including 614 homogeneous (singleton protein derived) families
  2139 ORFan / 56061 total CDSs
# parsing nr protein alignments
1000 / 3998
2000 / 3998
3000 / 3998
# write gene family lists and genome gene content matrices
matrix of family counts( genome x non-ORFan families):
  '/scratch/clamchatka/Panta/test9/02.gene_alignments/full_families_genome_counts-noORFans.mat'
matrix of family counts( genome x ORFan families):
  '/scratch/clamchatka/Panta/test9/02.gene_alignments/PANTAGC000000_genome_counts-ORFans.mat'
# extract CDSs from genomic dump files
14      source files parsed ; 01families in buffern buffer
# reverse translate alignments

[2019-11-18 19:38:10]-- complete generation of full CDS alignments without critical errors
[2019-11-18 19:39:51]-- Complete generating the gene family count matrices
Pantagruel pipeline task 2: complete.
[wilkins@gorilla Panta]$ /scratch/clamchatka/Panta/pantagruel/pantagruel -i /scratch/clamchatka/Panta/test9/environ_pantagruel_test9.sh sqldb
This is Pantagruel pipeline version 4867c048788ba7ec92dfd5ae9148d0349411151c using source code from repository '/scratch/clamchatka/Panta/pantagruel'
# will run tasks: 3
[2019-11-18 19:40:38] Pantagruel pipeline task 3: initiate SQL database and load genomic object relationships.
Create new task folder '/scratch/clamchatka/Panta/test9/03.database'
currently set variables:
database=/scratch/clamchatka/Panta/test9/03.database dbname=test9 metadata=/scratch/clamchatka/Panta/test9/00.input_data/genome_infos/assembly_metadata assemblyinfo=/scratch/clamchatka/Panta/test9/00.input_data/genome_infos/assembly_info protali=/scratch/clamchatka/Panta/test9/02.gene_alignments protfamseqtab=/scratch/clamchatka/Panta/test9/01.seqdb/protein_families/all_proteomes.nr.mmseqs_clusterdb_default_clusters_fasta.tab protorfanclust=PANTAGP000000 cdsorfanclust=PANTAGC000000 usergenomeinfo=/scratch/clamchatka/Panta/user_genomes/strain_infos_test9.txt usergenomefinalassdir=/scratch/clamchatka/Panta/test9/00.input_data/genbank-format_assemblies gp2ass=/scratch/clamchatka/Panta/test9/00.input_data/genomesource_assemblyid_assemblyname.txt
--2019-11-18 19:40:39--  http://www.uniprot.org/docs/speclist
Resolving www.uniprot.org (www.uniprot.org)... 193.62.192.81, 128.175.245.185
Connecting to www.uniprot.org (www.uniprot.org)|193.62.192.81|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.uniprot.org/docs/speclist [following]
--2019-11-18 19:40:39--  https://www.uniprot.org/docs/speclist
Connecting to www.uniprot.org (www.uniprot.org)|193.62.192.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: unspecified [text/html]
Saving to: ‘speclist’

    [                  <=>                                                                                               ] 2,284,855    512KB/s   in 5.6s   

2019-11-18 19:40:45 (400 KB/s) - ‘speclist’ saved [2284855]

assemblies (assembly_id, assembly_name, organism, species, subspecies, serovar, strain, taxid, primary_pubmed_id, country, isolation_source, host, clinical_source, collection_year, collection_month, collection_day, sequencing_technology, sequencing_coverage, note)
replicons (assembly_id, genomic_accession, replicon_name, replicon_type, replicon_size)
protein_products (product, nr_protein_id)
protein_fams (nr_protein_id, protein_family_id)
codingsequences (genomic_accession, locus_tag, cds_begin, cds_end, cds_strand, genbank_cds_id, nr_protein_id)
cdsfam (genbank_cds_id, gene_family_id)
Generating 'uniprotcode_taxid' table:
Traceback (most recent call last):
  File "/scratch/clamchatka/Panta/pantagruel/scripts/pantagruel_sqlitedb_genome_populate.py", line 345, in <module>
    main(dbname, protorfanclust, cdsorfanclust, nfspeclist, nfgsrc2assidname, nfusergenomeinfo, usergenomefinalassdir)
  File "/scratch/clamchatka/Panta/pantagruel/scripts/pantagruel_sqlitedb_genome_populate.py", line 234, in main
    code, taxid = (duginfo['locus_tag_prefix'], duginfo['taxid'])
KeyError: 'locus_tag_prefix'
/scratch/clamchatka/Panta/test9/03.database
Error: no such column: code
Error: no such column: code
Error: no such column: cds_code
Traceback (most recent call last):
  File "/scratch/clamchatka/Panta/pantagruel/scripts/genbank2code_fastaseqnames.py", line 43, in <module>
    pool.map(genbank2code, iter((nfinfa, transnames, dirout, queue) for nfinfa in lnfinfa))
  File "/apps/python2/2.7.15/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/apps/python2/2.7.15/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
KeyError: 'CteimSTRI051_02542'
Traceback (most recent call last):
  File "/scratch/clamchatka/Panta/pantagruel/scripts/genbank2code_fastaseqnames.py", line 43, in <module>
    pool.map(genbank2code, iter((nfinfa, transnames, dirout, queue) for nfinfa in lnfinfa))
  File "/apps/python2/2.7.15/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/apps/python2/2.7.15/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
KeyError: 'CtegalHel001_03432'
Pantagruel pipeline task 3: complete.
[wilkins@gorilla Panta]$ 
flass commented 4 years ago

Hi Laetitia,

back again! next game 🎾 :

Ok so there is an issue with detecting the 'locus_tag_prefix' column in your strain information file, which from the above seem to be this one usergenomeinfo=/scratch/clamchatka/Panta/user_genomes/strain_infos_test9.txt

From your previous error reports, I think the syntax for this file is correct... at least it seems to be ! here my guess is that you have manually written/edited this file under a non-Unix environment, and that the end-of-line feed is not the single \n character but something else (typically under Windows it is \r\n (see https://en.wikipedia.org/wiki/Newline). The 'locus_tax_prefix' column being at the end of the line, that would explained why it's not recognised (because Python would read it as 'locus_tax_prefix\r'). I could code something that escapes this naughty character, but it is BAD in general to have the wrong EOL as many Unix-based programs just won't have it and make an error, so I won't encourage it. It's easy to fix using the sed command.

And actually, I won't explain it here as this makes me think that I already explained that in another issue, cf. #14 ! So please refer to that one (and in the future have a look at previous issues before posting your own, it might have been resolved already 😉 )

Best,

Florent

flass commented 4 years ago

that said maybe that is not the case - actually since b20d316 there is a sanity check in the init script that should check for this so it should not be the case! - so let me know otherwise.

megaptera-helvetiae commented 4 years ago

Hi Florent,

so this took me a few days... After all, I had written my config file and strain infos file in nano on the server...

So, there was a simple typo in one of the column names.

locus_tax_prefix instead of locus_tag_prefix.

You can gladly close this issue.

Thank you.