Closed sdaume closed 8 years ago
Perhaps the best way to analyse that would be to calculate the difference between the "record created timestamp" and the "timestamp of actual occurrence". This would not need to be sliced up by snapshot, as it can all be derived from the latest index since the created does not change once written
What should we do here please Stefan? Perhaps if we restrict the analysis to annual, and provide a table of:
speciesId, collectedYear, firstIndexedYear, count
?
This would allow you to get stats per species of the latencies? It would not allow any kind of location break down though. If I were to export all records with those fields it'd be quite some size file...
Just to make sure that I understand correctly: this would be separate query and the latency is measured in years, by species rather than individual occurrence record. If so, I will have to think about this again and explore a few more angles in the data I already have. It may be necessary to bring the publisher/dataset id in as an additional dimension to get some interesting results out of this.
That was the suggestion. I can of course provide the latency by day for each record, but then you are dealing with file sizes that will likely be difficult in R. I'd suggest we group by something, or apply a filter to e.g. reduce to the invasives only - that would be a manageable number of records I'd guess.
No activity for over a year.
I wonder what would be the best way to explore the latency with which data is reported. If I plot for example the observations of grey squirrel in the UK for recent years it seems likely that either the data for the last 2 years is still incomplete, monitoring intensity has decreased or the grey squirrel population has decreased dramatically. I am pretty sure it is the first, and it would be an interesting angle to explore for other species as well.
Is there something like a "firstUploaded" field in the dataset?