censoredplanet / censoredplanet-analysis

Analysis of the CensoredPlanet data.
Apache License 2.0
14 stars 5 forks source link

read in control page fetches #231

Closed ohnorobo closed 1 year ago

ohnorobo commented 1 year ago

Read in the control_pages.json files.

For the e2e test I just added some random data from 2023-03-05. It doesn't join correctly with the 2022-10-20 data, but I just wanted to exercise that the file can be read in correctly. control_pages.json doesn't exist before 2022-06-26 and I didn't want to redo the whole set of test data.

Tested: e2e test passed

running test backfill. Edit: failed, had to allow parsing with dup page fetches from the new file run here)