JTFouquier / ghost-tree

creating hybrid-gene phylogenetic trees for diversity analyses
BSD 3-Clause "New" or "Revised" License
29 stars 20 forks source link

Underscore compatibility #67

Closed fanli-gcb closed 8 years ago

fanli-gcb commented 8 years ago

Mainly in the context of getting this to work well with QIIME and the underlying cogent parser. Currently, the Newick format trees contain underscores as do the UNITE database FASTA and taxonomy files, e.g. SH024512.07FU_UDB015580_refs

The output from get_otus_from_ghost_tree.py replaces underscores with spaces #56 , e.g. SH024512.07FU UDB015580 refs

https://github.com/biocore/qiime/blob/master/qiime/parse.py#L76 uses DnDParser from https://github.com/pycogent/pycogent/blob/master/cogent/parse/tree.py where these lines convert underscores to spaces:

if '_' in t:
  t = t.replace('_', ' ')

One possible fix would be to add a preserve_underscores option to DnDParser. But it seems that at the very least this would require changes to both the cogent and qiime code, so I'm not really sure where to put this issue...

JTFouquier commented 8 years ago

Thank you so much for taking the time to report this. There is an option in skbio to preserve underscores that I need to test and push. I've been busy and I am a little behind. :)

fanli-gcb commented 8 years ago

Ugh, reading https://github.com/biocore/scikit-bio/issues/1225 and https://github.com/biocore/scikit-bio/issues/934 makes me regret bringing this up.

The current release of QIIME (1.9.1) uses cogent tree parser, not the one from skbio. Is this going to change with QIIME2? In other words, would it be worth pushing a similar option to affect users of 1.9.1?

jairideout commented 8 years ago

QIIME 2 will likely use scikit-bio's newick parser or ETE (it's uncertain right now). QIIME 2 will not depend on PyCogent.