cmu-delphi / delphi-epidata

An open API for epidemiological data.
https://cmu-delphi.github.io/delphi-epidata/
MIT License
100 stars 68 forks source link

Clarify date terminology in API documentation #397

Open krivard opened 3 years ago

krivard commented 3 years ago

This page currently explains the difference between time_value and issue: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_times.html

However it might be better to establish more formal terminology such as "reference date" vs "issue date". We should

Context from a documentation review thread on the upcoming COVIDcast HHS hospitalization documentation in PR #344:

[Roni] "First issued" in this context immediately made me think of when hhs first issued this data, which was in mid December 2020. Should we change this to "First issued by Delphi" or "First offered on Delphi Epidata" or some such thing?

[Katie] This is the header used on all COVIDcast indicator pages -- change everywhere?

[Roni] Yes, but obviously if it's a big headache to change, we can wait for a more opportune time (but maybe just find a way to clarify it in the documentation of this documentation).

[Katie] There are two dates it is important for API users to know: the earliest issue they can request in as_of, and the earliest date they can request in time_value. This header is for the earliest as_of; we've been putting earliest time_value in the table of signal descriptions since they are often different for each signal within a source.

Does "Earliest X available" make it clear what's going on, or would a paragraph be better?

[Roni] I like "earliest issue available". It would be good to also tie it to the earliest 'as_of' date available, for users who recognize the latter better than the former. Maybe 'Earliest issue/as_of date available' ?

Btw, I don't particularly like 'Earliest time_value available' (if that's what's currently used in the signal descriptions). I might prefer 'Earliest reference date available'. But if time_value is used throughout for the reference date, so be it.

[Katie] "Earliest time_value" has never been used; it is currently "First date" and I'm proposing we change it to "Earliest date" (we don't use "reference date" anywhere so I'm hesitant to introduce new vocabulary)

[Roni I see. Still, 'date' is awfully generic. At some point, we need to clearly establish several recurring date concepts, assign them unique, mostly self-explanatory names, and use these names consistently. For the two important date types we are discussing here, I tend to use 'reference date' and 'issue date'. But I am very open to other suggestions. I also understand if you don't want to start by using it just in this location, so we can shelve this discussion for now.

RoniRos commented 3 years ago

I just reviewed the documentation page above.

time_value: The time the underlying events happened. For example, when a data source reports on COVID test results, the time value is the date the results were recorded by the testing provider.

The first sentence (boldfacing mine) describes exactly what I would suggest referring to as "reference_date": it's the date whose events are being _referredto. Alternatively it could be called 'event_date', or something else that conveys the same idea.

There is still possible confusion about what the "underlying events" are for a particular signal, and this has to be clarified in the signal's documentation. For example, in the subsequent example, the event is the recording of a test result by the testing provider, as opposed to, say, the specimen collection date, which is actually available in many testing data streams.