DocNow / catalog

A simple catalog of Twitter ID Datasets
http://catalog.docnow.io/
Other
28 stars 34 forks source link

Dates covered vs dates collected #4

Open ruebot opened 7 years ago

ruebot commented 7 years ago

Should we make a distinction between the two in the metadata?

edsu commented 7 years ago

In b73897d I added an added dataset field to indicate when the dataset was added to the catalog, which is different from the date the dataset was published.

But it sounds like you are talking about a different kind of date distinction. Are you thinking of a situation where the time period in which data collection ran does not match the time period of the tweets collected? I think this can only happen when running searches right?

ruebot commented 7 years ago

Are you thinking of a situation where the time period in which data collection ran does not match the time period of the tweets collected? I think this can only happen when running searches right?

Exactly!

In the Scholars Portal Dataverse, where I put our datasets, we have the option of adding date ranges for "date covered" and "dates collected". This works well for all the major collections I've done with twarc, since I use a strategy of filter and search, and then deduplication.