emvolz / treedater

Scalable relaxed clock phylogenetic dating
24 stars 11 forks source link

only sampling year #4

Closed ghost closed 6 years ago

ghost commented 6 years ago

What should I do if some sequences only have sampling year for sts, thanks.

emvolz commented 6 years ago

You can supply a data frame with upper and lower bounds for sequences that have uncertain dates. For example, if you know it was sampled during 2014, the bounds could be (2014,2015). An example is at the end of the vignette here: https://cran.r-project.org/web/packages/treedater/vignettes/h3n2.html

ghost commented 6 years ago

@emvolz I want to know (2014,2015) is the same as (2014.0-2015.0), thanks.

ghost commented 6 years ago

if the first and the second samples are sampled at 2014, and 2015, respectively, can it be coded as following, thanks. sts.df <- data.frame( lower = sts[1:2] - 0, upper = sts[1:2] + 1 )

emvolz commented 6 years ago

I want to know (2014,2015) is the same as (2014.0-2015.0), thanks.

Yes

if the first and the second samples are sampled at 2014, and 2015, respectively, can it be coded as following, thanks. sts.df <- data.frame( lower = sts[1:2] - 0, upper = sts[1:2] + 1 )

Yes - that will work, but make sure that the data frame has row names that correspond to the sequence names

slvrshot commented 4 years ago

@emvolz

I am trying to replicate this because my sample dates are years only. However I keep getting this:

tre$tip.label

[320] "ERR854973_2005"                            
[321] "ERR854885_2006"                            
[322] "ERR855018_2011"                            
[323] "ERR855033_2011"                            
[324] "ERR855023_2011"   
sts <- sampleYearsFromLabels(tre$tip.label, delimiter="_")
head(sts) 

> head(sts)
      ERR123_2016       ERR124_2016       ERR456_2016 
                   NA                    NA                    NA 
      ERR199_2016 597.fasta_2019       669_2016 
                   NA                    NA                    NA
emvolz commented 4 years ago

The strsplit function in R would work for extracting these years. Try this:

years <- sapply( strsplit( tre$tip.label, split = '_' ), '[', 2 )
years <-  as.numeric( years )
names(years) <- tre$tip.label 

If you're not comfortable doing this in R, you can make the table elsewhere and load it with read.csv.