abcxyz / github-metrics-aggregator

Apache License 2.0
11 stars 3 forks source link

Investigate Using Actual Event Times #139

Open pdewilde opened 9 months ago

pdewilde commented 9 months ago

Currently, GMA adds a "received" time to every Big Query row. This time is based on when it receives the webhook, which may not be the same time as the event (especially if there are retries).

This may cause some queries to miss events you would expect in a certain timeframe.

It would be better to extract the event time from the payload of the event, though I haven't seen if every event type keeps that information in the same place.

Its worth investigating feasibility of extracting actual times either when processing the webhook, or as a scheduled query to update rows with a more accurate time, or to copy to a new table.

verbanicm commented 9 months ago

It would be better to extract the event time from the payload of the event, though I haven't seen if every event type keeps that information in the same place.

There isn't a consistent place, it is dependent on the type of event AND action, e.g. pull request opened -> pull_request.created_at or pull request closed -> pull_request.closed_at for the same event but different action.

I would also be interested to see if there is a creative way to do this, without manual work to map EVERY event type.

One of the best features of this service is that its flexible enough to ingest any event.