Open FStephenQuaratiello opened 2 years ago
Hey @FStephenQuaratiello would you mind providing the GraphQL query and the BigQuery SQL query that you ran so we can investigate further?
Sure thing:
BigQuery query:'''SELECT COUNT(*), EXTRACT(HOUR from EdgeStartTimestamp) AS hour FROM [TABLE] WHERE EdgeStartTimestamp > TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND EdgeStartTimestamp < TIMESTAMP(CURRENT_DATE()) AND ARRAY_TO_STRING(ARRAY_REVERSE([ ARRAY_REVERSE(SPLIT(ClientRequestHost, "."))[ORDINAL(1)], ARRAY_REVERSE(SPLIT(ClientRequestHost, "."))[ORDINAL(2)] ]), ".") = '%s' GROUP BY hour ORDER BY hour'''
GraphQL query: """ query { viewer { zones(filter: {zoneTag: "%s"} ) { httpRequests1hGroups(limit:24, filter:{date: "%s"}) { sum { requests } dimensions { datetime } } } } } """
Hey @FStephenQuaratiello! Thanks for sharing your queries. I'm not particularly familiar with this tool, but I can help with a GQL part.
httpRequests1hGroups
represents hourly aggregated eyeball requests. Therefore, if you want to compare with other sources, you should also make sure that you're counting eyeball requests.
On top of it, if you're using "today" in your query, then I would expect that the last hour would not be "full" because the aggregation nature assumes buffering.
Hi,
I've been noticing a slight (~1%) discrepancy between the number of records imported to BigQuery with this tool, and the number of requests reported by the Cloudflare GraphQL API for a given time period. For example, the GraphQL API reports 46,532 requests in a given hour, but in BigQuery, there are only 45,736 records with an
EdgeStartTimestamp
in that hour. A small difference, to be sure, but a noticeable one.Is this within expectations? And is there a better way to measure the health/quality of data imported by this tool?