Closed ghost closed 6 years ago
You can supply a data frame with upper and lower bounds for sequences that have uncertain dates. For example, if you know it was sampled during 2014, the bounds could be (2014,2015). An example is at the end of the vignette here: https://cran.r-project.org/web/packages/treedater/vignettes/h3n2.html
@emvolz I want to know (2014,2015) is the same as (2014.0-2015.0), thanks.
if the first and the second samples are sampled at 2014, and 2015, respectively, can it be coded as following, thanks. sts.df <- data.frame( lower = sts[1:2] - 0, upper = sts[1:2] + 1 )
I want to know (2014,2015) is the same as (2014.0-2015.0), thanks.
Yes
if the first and the second samples are sampled at 2014, and 2015, respectively, can it be coded as following, thanks. sts.df <- data.frame( lower = sts[1:2] - 0, upper = sts[1:2] + 1 )
Yes - that will work, but make sure that the data frame has row names that correspond to the sequence names
@emvolz
I am trying to replicate this because my sample dates are years only. However I keep getting this:
tre$tip.label
[320] "ERR854973_2005"
[321] "ERR854885_2006"
[322] "ERR855018_2011"
[323] "ERR855033_2011"
[324] "ERR855023_2011"
sts <- sampleYearsFromLabels(tre$tip.label, delimiter="_")
head(sts)
> head(sts)
ERR123_2016 ERR124_2016 ERR456_2016
NA NA NA
ERR199_2016 597.fasta_2019 669_2016
NA NA NA
The strsplit
function in R would work for extracting these years. Try this:
years <- sapply( strsplit( tre$tip.label, split = '_' ), '[', 2 )
years <- as.numeric( years )
names(years) <- tre$tip.label
If you're not comfortable doing this in R, you can make the table elsewhere and load it with read.csv
.
What should I do if some sequences only have sampling year for sts, thanks.