Closed nickreich closed 3 years ago
It looks like the earliest issue date available for this signal is 2020-05-07, so you won't be able to get the signal with as_of="2020-05-06"
. This is likely because we had to retroactively reconstruct the historical data from our database backups when we introduced revision tracking.
@eujing, do you recall what date range was available to us when we reconstructed JHU issues, and why May 7 is the first available date?
I guess I had expected that the data in this would be a facsimile of data available as of the versions of the JHU data on GitHub, but sounds like that isn't the case. Is it documented somewhere what the "ground truth" is for each source?
The ground truth for this source is the JHU data on GitHub. Every day, our pipeline downloads the latest CSV, parses all the geographies, and produces the signal you see in the API.
The problem is that as_of
support requires a historical record of what data we ingested on a particular day. For example, if on May 7th JHU retroactively changes the count for Missouri on April 20th, asking for the data as_of
May 6th should return the old count, not the new one.
We didn't publicly release tracking of signal history until July 26th. Before then, each download of JHU data simply replaced the old data in our API. By parsing our database backups, we were able to recover the history of all changes starting May 7th. We did this from backups, not from JHU's Git history, so we could do the same for all of our signals from various sources.
I'm just not sure why it was May 7th and not some other time, but we can find out.
I see, thanks! that makes sense to me.
However, given that the public revision history is out there in the world to see, I suggest that you all consider, for a few sources (including JHU CSSE, which happens to be the source we at the COVID-19 Forecast Hub care about :-) ), tracing back the revision history based on the public record. As it is, your covidcast signal can't be comprehensive and authoritative for JHU CSSE without using that public data.
This work is under way in cmu-delphi/covidcast-indicators#23, and the geocoding refactor it's currently blocked on is expected to merge by early next week.
awesome, thanks for the update! I'll close this issue for now.
The following line of code returns no data
And gives warning messages like
However data for "mo" was certainly available at this time, as can be seen from this file: https://github.com/CSSEGISandData/COVID-19/blob/476c78eb96eb2d34483daea4c2fc44f3b38bf847/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv#L1604
Can these early data for "mo" be added (maybe there are other locations missing too? this was just the one that we stumbled across), or a warning returned saying that data provided may not be complete?