D-PLACE / dplace-data

The data repository for the D-PLACE Project (Database of Places, Language, Culture and Environment)
https://d-place.org
Creative Commons Attribution 4.0 International
78 stars 37 forks source link

Standardise the time scales for phylogenies #226

Closed xrotwang closed 1 year ago

xrotwang commented 5 years ago

It seems, posterior.trees for Grollemund et al. 2015 mixes branch length and change rate in a composite data type à la 1234.3@0.56. If read this way, the scale factor for the branch lengths seems to be years, giving ~6,800 years root age for Atlantic-Congo.

SimonGreenhill commented 5 years ago

Ok, we need to remove @... from these trees. What's the best way to do it? I can write some code/regex to do this, or should we use phyltr as discussed in https://github.com/lmaurits/phyltr/issues/16#issuecomment-453955556 ?

(I have a minor personal preference for something like sed, rather than adding another tool to the pipeline)

xrotwang commented 5 years ago

phyltr would be the right tool for the job, though. And it uses ete3 for the low-level nexus reading and writing, like pydplace - so it only adds the functionality we want without any overhead.

SimonGreenhill commented 5 years ago

Ok, but phyltr's currently non-functional on py3 ("ModuleNotFoundError: No module named 'main'") and ete3 won't load trees in this format anyway so it's unclear what phyltr gives us?

SimonGreenhill commented 5 years ago

I just realised that the summary trees and the posterior trees have different scales for this dataset. This is annoying.

Should we standardise the scales?

SimonGreenhill commented 5 years ago
phyltr cat < my.trees | phyltr scale -s 100 > out.trees
xrotwang commented 1 year ago

This kind of standardization is now done in Phlorest.