Open max-ostapenko opened 1 month ago
The transformation of crawl_staging.requests
into crawl.requests
lasted ~13h for Nov 2024 crawl (not including failed attempts).
@pmeenan let's update the wptagent to match crawl_staging
with crawl
schema.
As agreed we will not update legacy table anymore, so ready for a cleanup.
The older data schema is being reprocessed using these queries:
After we promote these new schemas to be the new default we need to update agent processing.
We should be able to just do
SELECT *
when copying data fromcrawl_staging
tocrawl
in crawl_complete pipeline.