The-Academic-Observatory / oaebu-workflows

Telescopes, Workflows and Data Services for the 'Book Analytics Dashboard Project (2022-2025)', building upon the project 'Developing a Pilot Data Trust for Open Access eBook Usage (2020-2022)'
https://documentation.book-analytics.org/
Apache License 2.0
5 stars 0 forks source link

UCL Sales Telescope #239

Closed keegansmith21 closed 4 months ago

keegansmith21 commented 4 months ago

The UCL Sales Telescope

The telescope pulls data from a google sheet. The sheet is divided into separate sheets (tabs?) for each sale month (i.e. 202401 for january 2024). Notably, the month of the sheet does not necessarily correspond to the month of the sale. The sale month will be recorded in each individual row. The sheets begin at 202308, but the sales predate this. I set the 'release_date' to be the same as the sale date but also add the "sheet_month" column to keep track of where each row derives its data from. This is a little confusing, but the nature of the data is a little confusing...

The telescope makes no attempt to aggregate the data. This will be done in pure SQL. Note that the month-by-month data can be sort of nonsensical. This is because this data also has returns (and 'free' sales, but they're to be ignored). So for any given month, the sales can actually be a negative quantity. Since we therefore don't really care about the timeseries for each individual title, we are perfectly happy doing a fairly simple aggregation step in Bigquery.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 97.81421% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.92%. Comparing base (cc43485) to head (db50e6a).

Files Patch % Lines
...rkflows/ucl_sales_telescope/ucl_sales_telescope.py 97.81% 2 Missing and 2 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #239 +/- ## ========================================== + Coverage 93.67% 93.92% +0.25% ========================================== Files 14 15 +1 Lines 2846 3029 +183 Branches 366 389 +23 ========================================== + Hits 2666 2845 +179 - Misses 100 102 +2 - Partials 80 82 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

keegansmith21 commented 4 months ago

Hi @keegansmith21 - thank you, this looks great! I've tested and the values and functionality of the table and the aggregate query work as I expect.

UCL have now confirmed that negative values should not appear in returns and paid quantity values - I think the data should still be ingested as is, with this handled in the aggregation query by converting return Quantities to ABS values, what are your thoughts?

Thanks @kathrynnapier! Yes I think handling the negatives in the aggregation query is best. I'd prefer to leave the telescope table as untouched as possible.