Closed keegansmith21 closed 4 months ago
Attention: Patch coverage is 97.81421%
with 4 lines
in your changes missing coverage. Please review.
Project coverage is 93.92%. Comparing base (
cc43485
) to head (db50e6a
).
Files | Patch % | Lines |
---|---|---|
...rkflows/ucl_sales_telescope/ucl_sales_telescope.py | 97.81% | 2 Missing and 2 partials :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @keegansmith21 - thank you, this looks great! I've tested and the values and functionality of the table and the aggregate query work as I expect.
UCL have now confirmed that negative values should not appear in returns and paid quantity values - I think the data should still be ingested as is, with this handled in the aggregation query by converting return Quantities to ABS values, what are your thoughts?
Thanks @kathrynnapier! Yes I think handling the negatives in the aggregation query is best. I'd prefer to leave the telescope table as untouched as possible.
The UCL Sales Telescope
The telescope pulls data from a google sheet. The sheet is divided into separate sheets (tabs?) for each sale month (i.e. 202401 for january 2024). Notably, the month of the sheet does not necessarily correspond to the month of the sale. The sale month will be recorded in each individual row. The sheets begin at 202308, but the sales predate this. I set the 'release_date' to be the same as the sale date but also add the "sheet_month" column to keep track of where each row derives its data from. This is a little confusing, but the nature of the data is a little confusing...
The telescope makes no attempt to aggregate the data. This will be done in pure SQL. Note that the month-by-month data can be sort of nonsensical. This is because this data also has returns (and 'free' sales, but they're to be ignored). So for any given month, the sales can actually be a negative quantity. Since we therefore don't really care about the timeseries for each individual title, we are perfectly happy doing a fairly simple aggregation step in Bigquery.