HTTPArchive / bigquery

BigQuery import and processing pipelines
67 stars 20 forks source link

Fix response_bodies pipeline #123

Closed rviscomi closed 3 years ago

rviscomi commented 3 years ago

To get around the 15 TB limitation of Dataflow, load the response_bodies tables in halves.

This change also upgrades to Apache Beam 2.31 and removes the old experimental use_beam_bq_sink flag.

Tested successfully on 2021_07_01 for both desktop and mobile.