The-Academic-Observatory / oaebu-workflows

Telescopes, Workflows and Data Services for the 'Book Analytics Dashboard Project (2022-2025)', building upon the project 'Developing a Pilot Data Trust for Open Access eBook Usage (2020-2022)'
https://documentation.book-analytics.org/
Apache License 2.0
5 stars 0 forks source link

INF-598: ONIX telescope keyword parsing #136

Closed keegansmith21 closed 1 year ago

keegansmith21 commented 1 year ago

We are running issues when making the book_product table for many of the new telescopes. Each time it errors, it's because the Subjects.SubjectHeadingText field is improperly populated for Keywords. The Keywords should not be a repeated field, but a single string where each keyword is separated by a semicolon. This is the only format that the SQL will reliably work with. This PR adds a step during the transform phase of the ONIX telescope to collapse repeated keywords into a single string and also attempts to change incorrect separators (commas, colons) to semicolons (as these also cause errors).

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 :tada:

Comparison is base (007cfd3) 95.03% compared to head (f42be00) 95.04%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #136 +/- ## =========================================== + Coverage 95.03% 95.04% +0.01% =========================================== Files 23 23 Lines 2698 2704 +6 Branches 351 352 +1 =========================================== + Hits 2564 2570 +6 Misses 62 62 Partials 72 72 ``` | [Impacted Files](https://codecov.io/gh/The-Academic-Observatory/oaebu-workflows/pull/136?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The-Academic-Observatory) | Coverage Δ | | |---|---|---| | [oaebu\_workflows/workflows/onix\_telescope.py](https://codecov.io/gh/The-Academic-Observatory/oaebu-workflows/pull/136?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The-Academic-Observatory#diff-b2FlYnVfd29ya2Zsb3dzL3dvcmtmbG93cy9vbml4X3RlbGVzY29wZS5weQ==) | `93.25% <100.00%> (+1.31%)` | :arrow_up: | | [oaebu\_workflows/workflows/thoth\_telescope.py](https://codecov.io/gh/The-Academic-Observatory/oaebu-workflows/pull/136?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The-Academic-Observatory#diff-b2FlYnVfd29ya2Zsb3dzL3dvcmtmbG93cy90aG90aF90ZWxlc2NvcGUucHk=) | `87.37% <100.00%> (-2.31%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The-Academic-Observatory). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The-Academic-Observatory)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.