abcxyz / github-metrics-aggregator

Apache License 2.0
12 stars 3 forks source link

Use a limited time window when looking for duplicates #138

Open pdewilde opened 1 year ago

pdewilde commented 1 year ago

Currently before we add to the PubSub topic, we do a query to ensure the row doesn't already exist in the table: https://github.com/abcxyz/github-metrics-aggregator/blob/bae8a213956a38ffb2ca721820c92af0235bde2c/pkg/webhook/bigquery.go#L117

It would be better if we used a WHERE to limit the amount of data that gets scanned to something more like a week.

This should probably be done by adding an additional argument to the chain of functions this is the base of.

This is low priority since the payload column isn't selected.