The reference_url is (most of the time) pointing to the exact data source, while the reference_time is just the timestamp of when the database dump was produced.
We want to add new fields and rename the existing ones like follows:
reference_url -> reference_url_data: This URL should point to the precise data location
reference_url_info: If available, this URL should point to a more human-readable website that explains the dataset.
reference_time -> reference_time_fetch: Same semantics as right now
reference_time_modification: If available, the timestamp when the data source was actually updated. We have some datasets (like ASdb), which are updated only every few months.
This change requires the update of some crawlers that modify reference_url dynamically.
Currently we provide the following reference fields with each crawler: https://github.com/InternetHealthReport/internet-yellow-pages/blob/f464e59a5065e850110d41df12632354739bba57/iyp/__init__.py#L615-L620
The
reference_url
is (most of the time) pointing to the exact data source, while thereference_time
is just the timestamp of when the database dump was produced.We want to add new fields and rename the existing ones like follows:
reference_url
->reference_url_data
: This URL should point to the precise data locationreference_url_info
: If available, this URL should point to a more human-readable website that explains the dataset.reference_time
->reference_time_fetch
: Same semantics as right nowreference_time_modification
: If available, the timestamp when the data source was actually updated. We have some datasets (like ASdb), which are updated only every few months.This change requires the update of some crawlers that modify
reference_url
dynamically.