In order to efficiently compare waf records, datagov wants to incorporate the timestamp of the record(s).
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
[ ] GIVEN harvest.py \
AND a waf source url
WHEN the waf traversal occurs \
THEN the timestamp of the record should be included in the Record instance \
AND used in the comparison function
Background
we are responsible for harvesting records from a waf
in order to be more efficient with our comparison we want to consider the timestamp of the files. the idea is to reduce the work required for a given waf source by only downloading what we need which means fewer network calls and fewer opportunities for something wrong to happen.
this ticket assumes a date/time stamp is included on DB harvest record read
It could be a challenge to scrap timestamp out of a WAF list since different web servers (or version) have different ways to show timestamps. Here is how ckanext-spatial does it.
User Story
In order to efficiently compare waf records, datagov wants to incorporate the timestamp of the record(s).
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
Record
instance \ AND used in the comparison functionBackground
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch
traverse_waf
compare
download_waf