Closed fanli-gcb closed 8 years ago
Thank you so much for taking the time to report this. There is an option in skbio to preserve underscores that I need to test and push. I've been busy and I am a little behind. :)
Ugh, reading https://github.com/biocore/scikit-bio/issues/1225 and https://github.com/biocore/scikit-bio/issues/934 makes me regret bringing this up.
The current release of QIIME (1.9.1) uses cogent
tree parser, not the one from skbio
. Is this going to change with QIIME2? In other words, would it be worth pushing a similar option to affect users of 1.9.1?
QIIME 2 will likely use scikit-bio's newick parser or ETE (it's uncertain right now). QIIME 2 will not depend on PyCogent.
Mainly in the context of getting this to work well with QIIME and the underlying cogent parser. Currently, the Newick format trees contain underscores as do the UNITE database FASTA and taxonomy files, e.g.
SH024512.07FU_UDB015580_refs
The output from
get_otus_from_ghost_tree.py
replaces underscores with spaces #56 , e.g.SH024512.07FU UDB015580 refs
https://github.com/biocore/qiime/blob/master/qiime/parse.py#L76 uses
DnDParser
from https://github.com/pycogent/pycogent/blob/master/cogent/parse/tree.py where these lines convert underscores to spaces:One possible fix would be to add a
preserve_underscores
option toDnDParser
. But it seems that at the very least this would require changes to both thecogent
andqiime
code, so I'm not really sure where to put this issue...