edgi-govdata-archiving / wayback

A Python API to the Internet Archive Wayback Machine
https://wayback.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
61 stars 12 forks source link

Correct CDX API timestamps with month of 00 #89

Closed edsu closed 1 year ago

edsu commented 1 year ago

This commit extends the existing logic for handling invalid days of 00 to months that are 00. It also adds a warning to be logged in both situations.

So if a timestamp of 20200001120000 will get rewritten to 20200112000000 prior to conversion to a datetime.

I have tested on live CDX API data that was failing, and this fix causes the full result to be returned. If more information is known about why this approach is taken it would be good to add in a comment?

Closes #88

Mr0grog commented 1 year ago

I'll also have to figure out what’s going on with the docs here. :\

edsu commented 1 year ago

I tried to address the docs build over in #90. I'm not sure if that's the right approach though, so of course please feel free to close if not.

Mr0grog commented 1 year ago

Heard back from Wayback folks here: https://internetarchive.slack.com/archives/C4PMRAN00/p1664039239024329

And it seems like our solution is the correct one for now. I should have time tomorrow (Sept. 29) to get this merged in (after we resolve the docs issue) and cut a release.

Mr0grog commented 1 year ago

@edsu FYI, this is released in v0.3.3!

edsu commented 1 year ago

Thank you @Mr0grog !