aweimann / traitar

GNU General Public License v3.0
21 stars 25 forks source link

failed to reproduce the example result using the docker image #67

Closed housw closed 5 years ago

housw commented 5 years ago

Hi,

I'm using the docker image aweimann/traitar:latest (02a16167973c) to test traitar with the following command:

root@fd9a9145f6ac:/home/traitar/traitar/traitar/data/sample_data/traitar_out2/phenotype_prediction# traitar phenotype . ./samples.txt from_nucleotides ./traitar_out2

and got the following error message:

running Pfam annotation with hmmer. This step can take a while. A rough estimate for sequential Pfam annotation of genome samples of ~3 Mbs is 10 min per genome.
running phenotype prediction
running feature track generation
running heatmap generation
/usr/lib/pymodules/python2.7/matplotlib/axes.py:2760: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
  + 'bottom=%s, top=%s') % (bottom, top))
/usr/lib/pymodules/python2.7/matplotlib/axes.py:2536: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=0.0, right=0
  + 'left=%s, right=%s') % (left, right))
/usr/lib/pymodules/python2.7/matplotlib/axes.py:2760: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
  + 'bottom=%s, top=%s') % (bottom, top))
/usr/lib/pymodules/python2.7/matplotlib/axes.py:2536: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=0.0, right=0
  + 'left=%s, right=%s') % (left, right))
/usr/lib/pymodules/python2.7/matplotlib/axes.py:2760: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
  + 'bottom=%s, top=%s') % (bottom, top))
/usr/lib/pymodules/python2.7/matplotlib/axes.py:2536: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=0.0, right=0
  + 'left=%s, right=%s') % (left, right))
root@fd9a9145f6ac:/home/traitar/traitar/traitar/data/sample_data#

Then I checked the phenotype prediction files and found all the prediction values were zeros:

root@fd9a9145f6ac:/home/traitar/traitar/traitar/data/sample_data/traitar_out2/phenotype_prediction# cat *.txt
    Salicin Catalase    Gelatin hydrolysis  Coccus  Lysine decarboxylase    Motile  Coccus - pairs or chains predominate    Maltose Growth on ordinary blood agar   Colistin-Polymyxin susceptible  Melibiose   Spore formation Yellow pigment  DNase   Nitrate to nitrite  Gram positive   Anaerobe    Bile-susceptible    Glucose oxidizer    Gram negative   Ornithine decarboxylase L-Arabinose Casein hydrolysis   Gas from glucose    Lactose Tartrate utilization    Raffinose   Cellobiose  L-Rhamnose  Bacillus or coccobacillus   Mucate utilization  Indole  D-Xylose    Starch hydrolysis   Growth on MacConkey agar    Citrate Urea hydrolysis Glycerol    Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase  D-Mannitol  Trehalose   Nitrite to gas  Arginine dihydrolase    Acetate utilization Malonate    myo-Inositol    Methyl red  ONPG (beta galactosidase)   D-Mannose   Growth in 6.5% NaCl Growth at 42°C  Glucose fermenter   Aerobe  Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase    Beta hemolysis  Growth in KCN   Hydrogen sulfide    Facultative Esculin hydrolysis  Sucrose D-Sorbitol  Coagulase production
Listeria_grayi_DSM_20601    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Listeria_ivanovii_WSLC3009  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    Salicin Catalase    Gelatin hydrolysis  Coccus  Lysine decarboxylase    Motile  Coccus - pairs or chains predominate    Maltose Growth on ordinary blood agar   Colistin-Polymyxin susceptible  Melibiose   Spore formation Yellow pigment  DNase   Nitrate to nitrite  Gram positive   Anaerobe    Bile-susceptible    Glucose oxidizer    Gram negative   Ornithine decarboxylase L-Arabinose Casein hydrolysis   Gas from glucose    Lactose Tartrate utilization    Raffinose   Cellobiose  L-Rhamnose  Bacillus or coccobacillus   Mucate utilization  Indole  D-Xylose    Starch hydrolysis   Growth on MacConkey agar    Citrate Urea hydrolysis Glycerol    Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase  D-Mannitol  Trehalose   Nitrite to gas  Arginine dihydrolase    Acetate utilization Malonate    myo-Inositol    Methyl red  ONPG (beta galactosidase)   D-Mannose   Growth in 6.5% NaCl Growth at 42°C  Glucose fermenter   Aerobe  Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase    Beta hemolysis  Growth in KCN   Hydrogen sulfide    Facultative Esculin hydrolysis  Sucrose D-Sorbitol  Coagulase production
Listeria_grayi_DSM_20601    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Listeria_ivanovii_WSLC3009  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    Salicin Catalase    Gelatin hydrolysis  Coccus  Lysine decarboxylase    Motile  Coccus - pairs or chains predominate    Maltose Growth on ordinary blood agar   Colistin-Polymyxin susceptible  Melibiose   Spore formation Yellow pigment  DNase   Nitrate to nitrite  Gram positive   Anaerobe    Bile-susceptible    Glucose oxidizer    Gram negative   Ornithine decarboxylase L-Arabinose Casein hydrolysis   Gas from glucose    Lactose Tartrate utilization    Raffinose   Cellobiose  L-Rhamnose  Bacillus or coccobacillus   Mucate utilization  Indole  D-Xylose    Starch hydrolysis   Growth on MacConkey agar    Citrate Urea hydrolysis Glycerol    Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase  D-Mannitol  Trehalose   Nitrite to gas  Arginine dihydrolase    Acetate utilization Malonate    myo-Inositol    Methyl red  ONPG (beta galactosidase)   D-Mannose   Growth in 6.5% NaCl Growth at 42°C  Glucose fermenter   Aerobe  Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase    Beta hemolysis  Growth in KCN   Hydrogen sulfide    Facultative Esculin hydrolysis  Sucrose D-Sorbitol  Coagulase production
Listeria_grayi_DSM_20601    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Listeria_ivanovii_WSLC3009  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

And then I checked the domtblout files, there were not hits found:

root@fd9a9145f6ac:/home/traitar/traitar/traitar/data/sample_data/traitar_out2/pfam_annotation# more Listeria_grayi_DSM_20601_domtblout.dat
#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of tar
get
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ------------------
---
#
# Program:         hmmsearch
# Version:         3.1b1 (May 2013)
# Pipeline mode:   SEARCH
# Query file:      /home/traitar/Pfam-A.hmm
# Target file:     ./traitar_out2/gene_prediction/Listeria_grayi_DSM_20601.faa
# Option settings: hmmsearch --domtblout ./traitar_out2/pfam_annotation/Listeria_grayi_DSM_20601_domtblout.dat --cut_ga --cpu 1 /home/traitar/Pfam-A.hmm ./traitar_out2/gene_prediction/Listeria_grayi_
DSM_20601.faa 
# Current dir:     /home/traitar/traitar/traitar/data/sample_data
# Date:            Mon Jan  7 02:05:45 2019
# [ok]
root@fd9a9145f6ac:/home/traitar/traitar/traitar/data/sample_data/traitar_out2/pfam_annotation# more *_domtblout.dat
::::::::::::::
Listeria_grayi_DSM_20601_domtblout.dat
::::::::::::::
#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of tar
get
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ------------------
---
#
# Program:         hmmsearch
# Version:         3.1b1 (May 2013)
# Pipeline mode:   SEARCH
# Query file:      /home/traitar/Pfam-A.hmm
# Target file:     ./traitar_out2/gene_prediction/Listeria_grayi_DSM_20601.faa
# Option settings: hmmsearch --domtblout ./traitar_out2/pfam_annotation/Listeria_grayi_DSM_20601_domtblout.dat --cut_ga --cpu 1 /home/traitar/Pfam-A.hmm ./traitar_out2/gene_prediction/Listeria_grayi_
DSM_20601.faa 
# Current dir:     /home/traitar/traitar/traitar/data/sample_data
# Date:            Mon Jan  7 02:05:45 2019
# [ok]
::::::::::::::
Listeria_ivanovii_WSLC3009_domtblout.dat
::::::::::::::
#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of tar
get
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ------------------
---
#
# Program:         hmmsearch
# Version:         3.1b1 (May 2013)
# Pipeline mode:   SEARCH
# Query file:      /home/traitar/Pfam-A.hmm
# Target file:     ./traitar_out2/gene_prediction/Listeria_ivanovii_WSLC3009.faa
# Option settings: hmmsearch --domtblout ./traitar_out2/pfam_annotation/Listeria_ivanovii_WSLC3009_domtblout.dat --cut_ga --cpu 1 /home/traitar/Pfam-A.hmm ./traitar_out2/gene_prediction/Listeria_ivan
ovii_WSLC3009.faa 
# Current dir:     /home/traitar/traitar/traitar/data/sample_data
# Date:            Mon Jan  7 02:04:53 2019
# [ok]

Finally I found the gene predictions are bizarre, all the proteins sequences contain only "X":

root@fd9a9145f6ac:/home/traitar/traitar/traitar/data/sample_data/traitar_out2/gene_prediction# more *.faa
::::::::::::::
Listeria_grayi_DSM_20601.faa
::::::::::::::
>gi|229555710|HMPREF0556_0875|HMPREF0556_0875|_1 # 3 # 74 # 1 # ID=2_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.931
XXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555714|HMPREF0556_0879|HMPREF0556_0879|_1 # 171 # 368 # 1 # ID=6_1;partial=01;start_type=ATG;rbs_motif=CCC;rbs_spacer=9bp;gc_cont=0.874
MXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXX
>gi|229555718|HMPREF0556_0883|HMPREF0556_0883|_1 # 1 # 327 # -1 # ID=10_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.917
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555719|HMPREF0556_0884|HMPREF0556_0884|_1 # 2 # 370 # -1 # ID=11_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.938
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXX
>gi|229555720|HMPREF0556_0885|HMPREF0556_0885|_1 # 1 # 312 # 1 # ID=12_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.926
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555721|HMPREF0556_0886|HMPREF0556_0886|_1 # 1 # 552 # -1 # ID=13_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.918
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXX
>gi|229555722|HMPREF0556_0887|HMPREF0556_0887|_1 # 3 # 149 # 1 # ID=14_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.857
XXXXWXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555725|HMPREF0556_0890|HMPREF0556_0890|_1 # 3 # 140 # 1 # ID=17_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.877
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555726|HMPREF0556_0891|HMPREF0556_0891|_1 # 58 # 234 # 1 # ID=18_1;partial=01;start_type=TTG;rbs_motif=CGC;rbs_spacer=7bp;gc_cont=0.853
MXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXVXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555728|HMPREF0556_0893|HMPREF0556_0893|_1 # 2 # 259 # 1 # ID=20_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.845
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555730|HMPREF0556_0895|HMPREF0556_0895|_1 # 238 # 438 # 1 # ID=22_1;partial=01;start_type=ATG;rbs_motif=CGC;rbs_spacer=6bp;gc_cont=0.831
MXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXX
>gi|229555733|HMPREF0556_0898|HMPREF0556_0898|_1 # 2 # 451 # 1 # ID=25_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.822
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXXXXXXXXXXXXXXXXXXXXXXXXXX
AXXXXXRXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLXXXXXTXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555734|HMPREF0556_0899|HMPREF0556_0899|_1 # 2 # 322 # 1 # ID=26_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.857
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555735|HMPREF0556_0900|HMPREF0556_0900|_1 # 1 # 444 # 1 # ID=27_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.867
XXXXXXXXXXXXXXXXXXXXKXXXXXXXXXXXXXXXXXEXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555737|HMPREF0556_0902|HMPREF0556_0902|_1 # 3 # 296 # 1 # ID=29_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.857
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi|229555738|HMPREF0556_0903|HMPREF0556_0903|_1 # 49 # 711 # 1 # ID=30_1;partial=01;start_type=TTG;rbs_motif=CGC;rbs_spacer=8bp;gc_cont=0.866
MXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXVXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXIXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Hope this helps.

Best, Shengwei

aweimann commented 5 years ago

Hi Shengwei,

Thanks for your interest. You need to use the from_genes options rather than the from_nucleotides option. Let me know if this works for you.

Bests, Aaron

housw commented 5 years ago

Hi Aaron,

thanks a lot, now it works beautifully.

Cheers, Shengwei