flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

TASK 5 Failed: failed search of populations in reference tree #44

Closed mattbawn closed 3 years ago

mattbawn commented 3 years ago

Hi Florent,

So after restarting succesfully:

 This is Pantagruel pipeline version ee0de31c0f56ef12bc42a2e9b7f009899659dbbe using source code from repository '/pantagruel'

will try and resume computation of task where it was last stopped
# will run tasks: 5 6 7 8
[2020-09-21 20:46:07] Pantagruel pipeline task 5: select core-genome markers and compute reference tree.
Create new task folder '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome'
will try and resume computation where it was last stopped; may skip/resume computing: core genome concatenated alignment, core ML tree search, core tree bootstrapping, core tree rooting with treebalance
run non-interactively '/pantagruel/scripts/choose_min_genome_occurrence_pseudocore_genes.sh' to record the gene family set.
Default: will use a strict core genome gene set, i.e. genes present in a single copy in all the 134 studied genomes.
'pseudocoremingenomes' variable is set to 134; this integer value is interpreted as a number of genomes
[1] extract values of pseudocoremingenomes from file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/tmp/mingenom'
try values of pseudocoremingenomes = 134, 134
Loading matrix of gene families counts in genomes...
Loading correspondence table of assembly accessions to genome codes...
number of unicopy gene families present in at least n genomes (out of 134):
 134  133  132  131  130  129  128  127  126  125  124  123  122  121  120  119 
 546 1011 1210 1307 1355 1382 1396 1406 1414 1416 1423 1433 1435 1444 1458 1472 
 118  117  116  115  114  113  112  111  110  108  107  105  104  103  102  101 
1481 1488 1493 1500 1504 1506 1507 1509 1510 1511 1512 1514 1515 1518 1521 1522 
 100 
1524 
[1] test value 134 for P, the minimum number of genomes to be represented in pseudo-core unicopy gene families
results in a set of 546 pseudo-core unicopy gene families
plotting heatmap... Written list of 546 pseudo-core unicopy gene families (with min. genome nb. = 134) and graphical representation of their distribution at:
/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/pseudo-coregenome_sets/strict-core-unicopy_families.pdf
/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/pseudo-coregenome_sets/strict-core-unicopy_families.tab
results in a set of 546 pseudo-core unicopy gene families
plotting heatmap... Written list of 546 pseudo-core unicopy gene families (with min. genome nb. = 134) and graphical representation of their distribution at:
/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/pseudo-coregenome_sets/strict-core-unicopy_families.pdf
/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/pseudo-coregenome_sets/strict-core-unicopy_families.tab
Selected 134 as value of P
Saved data in file: '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/pseudo-coregenome_sets/pseudo-core-all.RData'.
Final choice of 546 pseudo-core unicopy gene families (present in at least 134 genomes).
defined the core-genome gene family set with success
created concatenated (pseudo)core-genome alignment in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/strict-core-unicopy_concat_cds_134-genomes_database.aln'
# check alignment and search for identical sequences
removed identical sequence in core alignment; reduced alignment stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/strict-core-unicopy_concat_cds_134-genomes_database.aln.reduced'
ML tree topology search
ML tree topology search complete; best tree stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/core-genome-based_reference_tree_database.topology'
ML tree parameter & branch length search under GAMMA-based model
ML tree parameter & branch length search complete; best tree stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/core-genome-based_reference_tree_database.branlen'
search bootstrap trees
map branch supports on ML tree
bootstrapping complete; bootstrap trees stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/core-genome-based_reference_tree_database.supports'
reference tree rooting complete; rooted tree stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/core-genome-based_reference_tree_database.rooted'
could not find branch supports; try looking in comments field
found branch support in comments
2 nodes without branch support documented
succesfully re-introduced identical sequences into reference tree; full reference tree stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/core-genome-based_reference_tree_database.full'
C1 Salmonella_enterica_str_ERR024387__C1
Salmonella_enterica_str_ERR024387__C1
C10 Salmonella_enterica_str_ERR024398__C10
Salmonella_enterica_str_ERR024398__C10
C100 Salmonella_enterica_str_ERR029234__C100
Salmonella_enterica_str_ERR029234__C100
C101 Salmonella_enterica_str_ERR029235__C101
Salmonella_enterica_str_ERR029235__C101
C102 Salmonella_enterica_str_ERR038757__C102
Salmonella_enterica_str_ERR038757__C102
C103 Salmonella_enterica_str_ERR038758__C103
Salmonella_enterica_str_ERR038758__C103
C104 Salmonella_enterica_str_ERR038759__C104
Salmonella_enterica_str_ERR038759__C104
C105 Salmonella_enterica_str_ERR038760__C105
Salmonella_enterica_str_ERR038760__C105
C106 Salmonella_enterica_str_ERR038761__C106
Salmonella_enterica_str_ERR038761__C106
C107 Salmonella_enterica_str_ERR038762__C107
Salmonella_enterica_str_ERR038762__C107
C108 Salmonella_enterica_str_ERR038763__C108
Salmonella_enterica_str_ERR038763__C108
C109 Salmonella_enterica_str_ERR038771__C109
Salmonella_enterica_str_ERR038771__C109
C11 Salmonella_enterica_str_ERR024400__C11
Salmonella_enterica_str_ERR024400__C11
C110 Salmonella_enterica_str_ERR038773__C110
Salmonella_enterica_str_ERR038773__C110
C111 Salmonella_enterica_str_ERR038774__C111
Salmonella_enterica_str_ERR038774__C111
C112 Salmonella_enterica_str_ERR038775__C112
Salmonella_enterica_str_ERR038775__C112
C113 Salmonella_enterica_str_ERR038789__C113
Salmonella_enterica_str_ERR038789__C113
C114 Salmonella_enterica_str_ERR038792__C114
Salmonella_enterica_str_ERR038792__C114
C115 Salmonella_enterica_str_ERR039364__C115
Salmonella_enterica_str_ERR039364__C115
C116 Salmonella_enterica_str_ERR039366__C116
Salmonella_enterica_str_ERR039366__C116
C117 Salmonella_enterica_str_ERR039367__C117
Salmonella_enterica_str_ERR039367__C117
C118 Salmonella_enterica_str_ERR039368__C118
Salmonella_enterica_str_ERR039368__C118
C119 Salmonella_enterica_str_ERR039369__C119
Salmonella_enterica_str_ERR039369__C119
C12 Salmonella_enterica_str_ERR024401__C12
Salmonella_enterica_str_ERR024401__C12
C120 Salmonella_enterica_str_ERR039371__C120
Salmonella_enterica_str_ERR039371__C120
C121 Salmonella_enterica_str_Representative1__C121
Salmonella_enterica_str_Representative1__C121
C122 Salmonella_enterica_str_Representative2__C122
Salmonella_enterica_str_Representative2__C122
C123 Salmonella_enterica_str_Representative3__C123
Salmonella_enterica_str_Representative3__C123
C124 Salmonella_enterica_str_Representative4__C124
Salmonella_enterica_str_Representative4__C124
C125 Salmonella_enterica_str_Representative5__C125
Salmonella_enterica_str_Representative5__C125
C126 Salmonella_enterica_str_Representative6__C126
Salmonella_enterica_str_Representative6__C126
C127 Salmonella_enterica_str_Representative7__C127
Salmonella_enterica_str_Representative7__C127
C128 Salmonella_enterica_str_Representative8__C128
Salmonella_enterica_str_Representative8__C128
C129 Salmonella_enterica_str_Representative9__C129
Salmonella_enterica_str_Representative9__C129
C13 Salmonella_enterica_str_ERR024402__C13
Salmonella_enterica_str_ERR024402__C13
C130 Salmonella_enterica_str_Representative10__C130
Salmonella_enterica_str_Representative10__C130
C131 Salmonella_enterica_str_Representative11__C131
Salmonella_enterica_str_Representative11__C131
C132 Salmonella_enterica_str_Representative12__C132
Salmonella_enterica_str_Representative12__C132
C133 Salmonella_enterica_str_Representative13__C133
Salmonella_enterica_str_Representative13__C133
C134 Salmonella_enterica_str_Representative14__C134
Salmonella_enterica_str_Representative14__C134
C14 Salmonella_enterica_str_ERR024404__C14
Salmonella_enterica_str_ERR024404__C14
C15 Salmonella_enterica_str_ERR024405__C15
Salmonella_enterica_str_ERR024405__C15
C16 Salmonella_enterica_str_ERR024406__C16
Salmonella_enterica_str_ERR024406__C16
C17 Salmonella_enterica_str_ERR024407__C17
Salmonella_enterica_str_ERR024407__C17
C18 Salmonella_enterica_str_ERR024408__C18
Salmonella_enterica_str_ERR024408__C18
C19 Salmonella_enterica_str_ERR024409__C19
Salmonella_enterica_str_ERR024409__C19
C2 Salmonella_enterica_str_ERR024388__C2
Salmonella_enterica_str_ERR024388__C2
C20 Salmonella_enterica_str_ERR024410__C20
Salmonella_enterica_str_ERR024410__C20
C21 Salmonella_enterica_str_ERR024411__C21
Salmonella_enterica_str_ERR024411__C21
C22 Salmonella_enterica_str_ERR024413__C22
Salmonella_enterica_str_ERR024413__C22
C23 Salmonella_enterica_str_ERR024417__C23
Salmonella_enterica_str_ERR024417__C23
C24 Salmonella_enterica_str_ERR024418__C24
Salmonella_enterica_str_ERR024418__C24
C25 Salmonella_enterica_str_ERR024629__C25
Salmonella_enterica_str_ERR024629__C25
C26 Salmonella_enterica_str_ERR024634__C26
Salmonella_enterica_str_ERR024634__C26
C27 Salmonella_enterica_str_ERR024954__C27
Salmonella_enterica_str_ERR024954__C27
C28 Salmonella_enterica_str_ERR028271__C28
Salmonella_enterica_str_ERR028271__C28
C29 Salmonella_enterica_str_ERR028272__C29
Salmonella_enterica_str_ERR028272__C29
C3 Salmonella_enterica_str_ERR024389__C3
Salmonella_enterica_str_ERR024389__C3
C30 Salmonella_enterica_str_ERR028277__C30
Salmonella_enterica_str_ERR028277__C30
C31 Salmonella_enterica_str_ERR028278__C31
Salmonella_enterica_str_ERR028278__C31
C32 Salmonella_enterica_str_ERR028279__C32
Salmonella_enterica_str_ERR028279__C32
C33 Salmonella_enterica_str_ERR028280__C33
Salmonella_enterica_str_ERR028280__C33
C34 Salmonella_enterica_str_ERR028282__C34
Salmonella_enterica_str_ERR028282__C34
C35 Salmonella_enterica_str_ERR028283__C35
Salmonella_enterica_str_ERR028283__C35
C36 Salmonella_enterica_str_ERR028284__C36
Salmonella_enterica_str_ERR028284__C36
C37 Salmonella_enterica_str_ERR028285__C37
Salmonella_enterica_str_ERR028285__C37
C38 Salmonella_enterica_str_ERR028286__C38
Salmonella_enterica_str_ERR028286__C38
C39 Salmonella_enterica_str_ERR028287__C39
Salmonella_enterica_str_ERR028287__C39
C4 Salmonella_enterica_str_ERR024391__C4
Salmonella_enterica_str_ERR024391__C4
C40 Salmonella_enterica_str_ERR028288__C40
Salmonella_enterica_str_ERR028288__C40
C41 Salmonella_enterica_str_ERR028289__C41
Salmonella_enterica_str_ERR028289__C41
C42 Salmonella_enterica_str_ERR028290__C42
Salmonella_enterica_str_ERR028290__C42
C43 Salmonella_enterica_str_ERR028291__C43
Salmonella_enterica_str_ERR028291__C43
C44 Salmonella_enterica_str_ERR028292__C44
Salmonella_enterica_str_ERR028292__C44
C45 Salmonella_enterica_str_ERR028294__C45
Salmonella_enterica_str_ERR028294__C45
C46 Salmonella_enterica_str_ERR028295__C46
Salmonella_enterica_str_ERR028295__C46
C47 Salmonella_enterica_str_ERR028296__C47
Salmonella_enterica_str_ERR028296__C47
C48 Salmonella_enterica_str_ERR028297__C48
Salmonella_enterica_str_ERR028297__C48
C49 Salmonella_enterica_str_ERR028298__C49
Salmonella_enterica_str_ERR028298__C49
C5 Salmonella_enterica_str_ERR024392__C5
Salmonella_enterica_str_ERR024392__C5
C50 Salmonella_enterica_str_ERR028299__C50
Salmonella_enterica_str_ERR028299__C50
C51 Salmonella_enterica_str_ERR028300__C51
Salmonella_enterica_str_ERR028300__C51
C52 Salmonella_enterica_str_ERR028301__C52
Salmonella_enterica_str_ERR028301__C52
C53 Salmonella_enterica_str_ERR028302__C53
Salmonella_enterica_str_ERR028302__C53
C54 Salmonella_enterica_str_ERR028303__C54
Salmonella_enterica_str_ERR028303__C54
C55 Salmonella_enterica_str_ERR028304__C55
Salmonella_enterica_str_ERR028304__C55
C56 Salmonella_enterica_str_ERR028306__C56
Salmonella_enterica_str_ERR028306__C56
C57 Salmonella_enterica_str_ERR028307__C57
Salmonella_enterica_str_ERR028307__C57
C58 Salmonella_enterica_str_ERR028308__C58
Salmonella_enterica_str_ERR028308__C58
C59 Salmonella_enterica_str_ERR028309__C59
Salmonella_enterica_str_ERR028309__C59
C6 Salmonella_enterica_str_ERR024394__C6
Salmonella_enterica_str_ERR024394__C6
C60 Salmonella_enterica_str_ERR028310__C60
Salmonella_enterica_str_ERR028310__C60
C61 Salmonella_enterica_str_ERR028311__C61
Salmonella_enterica_str_ERR028311__C61
C62 Salmonella_enterica_str_ERR028312__C62
Salmonella_enterica_str_ERR028312__C62
C63 Salmonella_enterica_str_ERR028313__C63
Salmonella_enterica_str_ERR028313__C63
C64 Salmonella_enterica_str_ERR028314__C64
Salmonella_enterica_str_ERR028314__C64
C65 Salmonella_enterica_str_ERR028315__C65
Salmonella_enterica_str_ERR028315__C65
C66 Salmonella_enterica_str_ERR028316__C66
Salmonella_enterica_str_ERR028316__C66
C67 Salmonella_enterica_str_ERR028631__C67
Salmonella_enterica_str_ERR028631__C67
C68 Salmonella_enterica_str_ERR028632__C68
Salmonella_enterica_str_ERR028632__C68
C69 Salmonella_enterica_str_ERR028633__C69
Salmonella_enterica_str_ERR028633__C69
C7 Salmonella_enterica_str_ERR024395__C7
Salmonella_enterica_str_ERR024395__C7
C70 Salmonella_enterica_str_ERR028634__C70
Salmonella_enterica_str_ERR028634__C70
C71 Salmonella_enterica_str_ERR028635__C71
Salmonella_enterica_str_ERR028635__C71
C72 Salmonella_enterica_str_ERR028636__C72
Salmonella_enterica_str_ERR028636__C72
C73 Salmonella_enterica_str_ERR028637__C73
Salmonella_enterica_str_ERR028637__C73
C74 Salmonella_enterica_str_ERR028638__C74
Salmonella_enterica_str_ERR028638__C74
C75 Salmonella_enterica_str_ERR028639__C75
Salmonella_enterica_str_ERR028639__C75
C76 Salmonella_enterica_str_ERR028640__C76
Salmonella_enterica_str_ERR028640__C76
C77 Salmonella_enterica_str_ERR028641__C77
Salmonella_enterica_str_ERR028641__C77
C78 Salmonella_enterica_str_ERR028643__C78
Salmonella_enterica_str_ERR028643__C78
C79 Salmonella_enterica_str_ERR028656__C79
Salmonella_enterica_str_ERR028656__C79
C8 Salmonella_enterica_str_ERR024396__C8
Salmonella_enterica_str_ERR024396__C8
C80 Salmonella_enterica_str_ERR028658__C80
Salmonella_enterica_str_ERR028658__C80
C81 Salmonella_enterica_str_ERR029213__C81
Salmonella_enterica_str_ERR029213__C81
C82 Salmonella_enterica_str_ERR029214__C82
Salmonella_enterica_str_ERR029214__C82
C83 Salmonella_enterica_str_ERR029215__C83
Salmonella_enterica_str_ERR029215__C83
C84 Salmonella_enterica_str_ERR029216__C84
Salmonella_enterica_str_ERR029216__C84
C85 Salmonella_enterica_str_ERR029217__C85
Salmonella_enterica_str_ERR029217__C85
C86 Salmonella_enterica_str_ERR029218__C86
Salmonella_enterica_str_ERR029218__C86
C87 Salmonella_enterica_str_ERR029219__C87
Salmonella_enterica_str_ERR029219__C87
C88 Salmonella_enterica_str_ERR029220__C88
Salmonella_enterica_str_ERR029220__C88
C89 Salmonella_enterica_str_ERR029221__C89
Salmonella_enterica_str_ERR029221__C89
C9 Salmonella_enterica_str_ERR024397__C9
Salmonella_enterica_str_ERR024397__C9
C90 Salmonella_enterica_str_ERR029222__C90
Salmonella_enterica_str_ERR029222__C90
C91 Salmonella_enterica_str_ERR029223__C91
Salmonella_enterica_str_ERR029223__C91
C92 Salmonella_enterica_str_ERR029225__C92
Salmonella_enterica_str_ERR029225__C92
C93 Salmonella_enterica_str_ERR029226__C93
Salmonella_enterica_str_ERR029226__C93
C94 Salmonella_enterica_str_ERR029228__C94
Salmonella_enterica_str_ERR029228__C94
C95 Salmonella_enterica_str_ERR029229__C95
Salmonella_enterica_str_ERR029229__C95
C96 Salmonella_enterica_str_ERR029230__C96
Salmonella_enterica_str_ERR029230__C96
C97 Salmonella_enterica_str_ERR029231__C97
Salmonella_enterica_str_ERR029231__C97
C98 Salmonella_enterica_str_ERR029232__C98
Salmonella_enterica_str_ERR029232__C98
C99 Salmonella_enterica_str_ERR029233__C99
Salmonella_enterica_str_ERR029233__C99
reference tree name translation in complete; tree with organism names stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/core-genome-based_reference_tree_database.full.names'
Reading the tree 1
Dating under temporal constraints mode ...
The results correspond to the estimation of relative dates when T[mrca]=0.000000 and T[tips]=1.000000
rate 1.124e-02 , tMRCA 0.000000 , objective function 4.462988e+03

Elapsed time: 0.054981 seconds
ultrametrization of reference tree complete; ultrametric tree stored in file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/05.core_genome/core-genome-based_reference_tree_database.full.lsd'

I recieve the error:

WARNING: Will rely on strict core genome definition to compute reference tree. This is often not advisable as the strict core genome can be very small.
To choose a sensible 'pseudocore genomes' gene set, please run interactively '/pantagruel/scripts/choose_min_genome_occurrence_pseudocore_genes.sh'.
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
Warning message:
NAs introduced by coercion 
pseudocoremingenomes=134
Traceback (most recent call last):
  File "/pantagruel/scripts/replace_species_by_pop_in_gene_trees.py", line 848, in <module>
    nbthreads = int(dopt.get('--threads', dopt.get('-T', -1)))
ValueError: invalid literal for int() with base 10: ''
ERROR: failed search of populations in reference tree
ERROR: Pantagruel pipeline task 5: failed.

Thanks

Matt

flass commented 3 years ago

HiI Matt,

sorry about that. i found the problem, it was the way the default number of parallel threads were defined for that script. it's fixed in [usingGeneRax] d84197a and [master] d1af716; the dockerhub builds should be ready soon.

I hope that will make it work fine for you.

Cheers, Florent