The-Academic-Observatory / oaebu-workflows

Telescopes, Workflows and Data Services for the 'Book Analytics Dashboard Project (2022-2025)', building upon the project 'Developing a Pilot Data Trust for Open Access eBook Usage (2020-2022)'
https://documentation.book-analytics.org/
Apache License 2.0
5 stars 0 forks source link

Feature/crossref academic observatory #144

Closed keegansmith21 closed 1 year ago

keegansmith21 commented 1 year ago

Since removing the dependency on the Academic Observatory Workflows in #105, we have run into several issues with the collection of crossref data. The crux of the issue is the volume of requests we need to make to the APIs (metadata and events). This PR will instead reintroduce the dependency on the AO workflows and query the Crossref Metadata master table. Noteably, this is a different solution as was used previously to #105. However, I believe that using the metadata table over the book table is preferable for transparency and simplicity. The caveat is that the metadata table is significantly larger and querying it once a week for each publisher will eventually become a noticeable cost. There are ways to mitigate this cost, but for the immediate future, this is a simple solution.

Querying the metadata table is done with a jinnja2-templated SQL statement, where each ISBN of interest is inserted into a temporary table which is subsequently joined against the metadata table's ISBN field. I have also included a similar file for the crossref events table in anticipation of its future use.

I have had to commit a sin in order to get the tests to pass. In the upcoming refactor #142 this will be removed in favour of a more complete table-checking method, so this is only temporary.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 67.85% and project coverage change: -0.26 :warning:

Comparison is base (3ce4d6d) 94.49% compared to head (8077e5b) 94.24%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #144 +/- ## =========================================== - Coverage 94.49% 94.24% -0.26% =========================================== Files 24 24 Lines 2890 2850 -40 Branches 380 371 -9 =========================================== - Hits 2731 2686 -45 - Misses 73 79 +6 + Partials 86 85 -1 ``` | [Impacted Files](https://app.codecov.io/gh/The-Academic-Observatory/oaebu-workflows/pull/144?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The-Academic-Observatory) | Coverage Δ | | |---|---|---| | [oaebu\_workflows/workflows/onix\_workflow.py](https://app.codecov.io/gh/The-Academic-Observatory/oaebu-workflows/pull/144?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The-Academic-Observatory#diff-b2FlYnVfd29ya2Zsb3dzL3dvcmtmbG93cy9vbml4X3dvcmtmbG93LnB5) | `93.14% <67.85%> (-1.19%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.