daveuu / baga

Bacterial and Archaeal Genome Analyser
GNU General Public License v3.0
9 stars 2 forks source link

Comparative Analyses problem #4

Open pauruihu opened 8 years ago

pauruihu commented 8 years ago

Hi! It's me again :) I find out that baga can't finish the last part of Comparative Analyses with my data and it's shown this:

-- Comparative Analyses --

Provide --genome_name or --genome_length for a scale bar unit of actual substitutions
Plotting to NZ_CP006918.1__Klebsiella_SNPs_rooted.labelled_tree_transfers.svg
rerooting to midpoint
Traceback (most recent call last):
  File "/home/paula/programas/baga/baga_cli.py", line 3060, in <module>
    genome_length = genome_length)
  File "/home/paula/programas/baga/ComparativeAnalysis.py", line 2421, in doPlot
    thisVGT = node_2_VGT[nodename]
KeyError: 'NODE 24'

Probably it happens because I haven't done something properly, but I don't know how to solve it. Thank you (again) Kind regards!

daveuu commented 8 years ago

Hi This looks like a bug and it may be related to rerooting the tree. (The current command in the documentation incorrectly omits "--out_group" which roots the tree using a specified out group - in that case the reference genome . . . which may or may not make biological sense depending on the samples. I will update the docs).

You could try:

baga/baga_cli.py ComparativeAnalysis \
--plot_phylogeny \
--path_to_tree NZ_CP006918.1__Klebsiella_SNPs_rooted.phy_phyml_tree \
--genome_name NZ_CP006918.1 \
--out_group NZ_CP00691

Which would avoid the "mid-point" rooting that occurs without "--out_group". "NZ_CP006918.1" has to be truncated to 10 characters: "NZ_CP00691" because of the underlying phylip format. If that works you could include the --plot_transfers command too.

If you could send me the tree that causes this problem (file: NZ_CP006918.1__Klebsiella_SNPs_rooted.phy_phyml_tree), that would help me fix it: david.williams at liverpool.ac.uk (replace at with @).

In the mean time it sounds like you did a successful analysis! You could use any tree viewer like figtree to view NZ_CP006918.1__Klebsiella_SNPs_rooted.phy_phyml_tree. You could put all of the commands into a single shell script and try to reproduce the whole analysis. Then you can publish the shell script with your findings so other people can reproduce the analysis.

pauruihu commented 8 years ago

I've just sent you an e-mail :)

pauruihu commented 8 years ago

I've just had another error in this part of the process with a bigger dataset:

Traceback (most recent call last):
  File "/home/paula/programas/baga/baga_cli.py", line 2978, in <module>
    MSA_builder.getCoverageRanges(paths_to_BAMs)
  File "/home/paula/programas/baga/ComparativeAnalysis.py", line 719, in getCoverageRanges
    print('{} bp in {} gaps missing from {}'.format(sum([(e-s) for s,e in these_missing_regions]), len(these_missing_regions), sample))
ValueError: need more than 1 value to unpack

:S