Closed annakrystalli closed 5 years ago
Hi
yes, sorry the studies that only reported values pooled across years where annoying, and some I went back to could simply not specify yearly data - so it had to remain as a range (introducing the columns start and end year). The way I so far dealt with it was then just simply using this decimal solution which you get in some cases (depends on how many years were covered). I appreciate that in your - much better approach - this causes problem. And indeed half years really don't make sense. On a pragmatic approach one possibly could argue that .5 could be rounded up. Over the temporal scale covered by the data for most species ca. 40 years and the relatively small number of studies that applies to (although there is one guillemot study which covered many sites where they can't any more assign what year what colony was done) - so I can't imagine that that rounding makes a difference. Maybe if it can be simply identified from a separate column whether it is a single year study or spans x years (study duration) that might be all the user might need?
Thank you very much Ruedi
Thanks for the fast feedback Ruedi!
OK, yes that sounds like a workable approach. And sure, I can easily create a column (eg multiyear
= TRUE
or FALSE
) for easy filtering and clear flagging. I'll go for that then.
Hi @ruedinager
Got one last question regarding data in the column
year
.At the minute you have dates such as
2012.5
. I appreciate you are aiming for the midpoint of the sampling duration but it is generally quite unusual to store year data as decimal numbers. Indeed it is being thrown up as an error during metadata creation becauseyear
as a unit for a numeric variable is not accepted (ie it is classed as aDate
variable rather than a time variable) so I'm being forced to create an ad hoc unit of measurement for it, which feels kind of wrong.I had a little think about it and my suggestion would be to instead convert that column to
YYYY-MM-DD
date format. So to get an entry with start date2001
and end2003
, the start date would be converted to2001-01-01
, the end date to2003-12-31
and the midpoint calculated arithmetically (which R handles nicely) and reported inYYYY-MM-DD
format, ie2002-07-02
. Note that this would only affect columnyear
.startyear
andendyear
would remain in theYYYY
format.I know this artificially increases the resolution of the data in that column but we are already doing that by calculating decimals. What do you think? @tomjwebb any thoughts on this?