Closed owentl closed 2 years ago
You're right @owentl, thanks for opening the pull request.
I happened to rewrite the function that sets the last run time today (to support >24 hour collection time spans) and it presented a nice opportunity to fix this. It now sets the last run time to the end time used in the collection URLs. The next time it runs it will start collecting where the last run stopped. This should've been the case earlier but it wasn't as you noted.
In this case I won't integrate the pull request because it should be fixed in this commit. Could you take a second look for me and let me know if you agree? Thanks! Relevant code below:
def _get_all_available_content(self):
end_time = datetime.datetime.now(datetime.timezone.utc)
for content_type in self._remaining_content_types.copy():
[...]
self._last_run_times[content_type] = end_time.strftime("%Y-%m-%dT%H:%M:%SZ")`
Closing this PR as it should be fixed. Starting work on integrating #21.
Use the correct time when persisting the time of the last request. Previously start_time was used which will always pull from the old time. If there are no errors we want to use the current run time not the original start time (provided or persisted to disk)