Closed cmgosnell closed 1 year ago
Sometimes I feel like scrapping all this data too man.
I did not need to edit the scrapers to make this work. I'm not sure why our past scrapers were not grabbing the ER files. I vaguely recall the ER data being in a separate box on the page, which is definitely not the case now. Both the 2021 ER and 2022 partial year was scrapped/archived.
Sandbox version: https://sandbox.zenodo.org/record/1090056 Big kid version: https://zenodo.org/record/6953766 (DOI: 10.5281/zenodo.6953766)
So as the page is currently formatted, it identifies the ER data as the 2021 data (without any indication that it's ER?) and the 2022 partial / monthly updates as "the 2022" data even though it's not complete? Do we foresee that causing any issues? I guess we can just adjust what years are "working" In the data source metadata, and when we bring in a new archive, if there are changes to the spreadsheet formatting we'll have to update e.g. the skiprows.
Augment the spider to grab the ER data and make new zenodo archive