HTTPArchive / wptagent

Cross-platform WebPageTest agent
Other
1 stars 0 forks source link

Update `pages` and `requests` schemas written by agent to BQ to a new one #18

Open max-ostapenko opened 1 month ago

max-ostapenko commented 1 month ago

The older data schema is being reprocessed using these queries:

After we promote these new schemas to be the new default we need to update agent processing.

We should be able to just do SELECT * when copying data from crawl_staging to crawl in crawl_complete pipeline.

max-ostapenko commented 14 hours ago

The transformation of crawl_staging.requests into crawl.requests lasted ~13h for Nov 2024 crawl (not including failed attempts).

@pmeenan let's update the wptagent to match crawl_staging with crawl schema. As agreed we will not update legacy table anymore, so ready for a cleanup.