CompEvol / beast2

Bayesian Evolutionary Analysis by Sampling Trees
www.beast2.org
GNU Lesser General Public License v2.1
237 stars 83 forks source link

TraitSet slightly deforms taxon times #772

Open tgvaughan opened 6 years ago

tgvaughan commented 6 years ago

The method TraitSet::convertValueToDouble is responsible for taking a string representation of a sampling time and converting this to a real number. For calendar dates, these are parsed using a date string parser according to a particular user-specified format. The resulting date object is then converted into a number representing fractional years.

Supposing that date is the date object returned by the parser, this conversion is accomplished by the following expression:

double year = date.getYear() + (date.getDayOfYear()-1.0) / (date.isLeapYear() ? 366.0 : 365.0);

The result of the convertValueToDouble method is then just year or something directly proportional to this.

While this will always produce sensible-looking results, there is a nonlinear deformation of the time distribution going on here because of the difference in length between leap years and non-leap years. Of course this is unavoidable if we're to use a "year" as a standard length of time, but the problem persists even when "days" or "weeks" are specified as the time unit.

Of course this is not an issue for macroevolutionary studies, but this can cause strange artifacts in epidemiological studies where sampling times are known precisely and can wind up being shifted slightly on the tree as a result of this.

To fix this we could just replace the non-fixed denominator in the expression for year with an effective year length of 365.25.

rbouckaert commented 6 years ago

TaxonSet:: = TraitSet:: in the first line? I suppose the problem is that the convertValueToDouble method converts to year and from year to months or days, while it should convert to days, and from there to years or months to fix this properly. There's a bit of gnarly detail to be worked out to get this right though.