HTTPArchive / bigquery

BigQuery import and processing pipelines
67 stars 20 forks source link

Ignore secondary pages in non-summary pipeline #174

Closed rviscomi closed 2 years ago

rviscomi commented 2 years ago

Only home page data should be written to the YYYY_MM_DD tables (ie pages.2022_06_01_desktop). The new pipeline being developed in https://github.com/HTTPArchive/data-pipeline/pull/75 will handle writing home and secondary page data to the new all dataset.