Closed BalduinLandolt closed 2 years ago
Awesome! I'll have a closer look tomorrow.
Awesome! I'll have a closer look tomorrow.
not yet there, I'm afraid... just wanted to see the test runs, so I drafted the PR. But it's taking shape... :)
Base: 44.23% // Head: 47.91% // Increases project coverage by +3.67%
:tada:
Coverage data is based on head (
738737b
) compared to base (36665dd
). Patch coverage: 28.41% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@kraus-s the core feature is implemented, though it surly can still be improved. But the unification should be - at least to some extent - reasonable.
There is probably some clean-up still to do, just ignore that for now. ;-)
What I have not done at all, is seeing if we need junction tables for the unified manuscript data; and implementing ways to query that data. (If we should even still be able to query the un-unified data in the first place? maybe unified data is enough?)
I'm wondering if I should postpone that to a next PR, so nothing is blocked. What do you think?
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
26 Code Smells
No Coverage information
0.0% Duplication
I now changed it, so that the search functions actually use the unified data.
Additionally I triaged a bug in the new XML processing, which meant that only 1/30 texts were found in the manuscript descriptions... but we'll have to have a very close look at the xml data extraction. I'm under the impression that considerable regressions have happened there.
Regressions in the xml extraction as in the new way we implemented lxml instead of beautiful soup? If so, I'll have a look
Balduin Landolt @.***> schrieb am So., 2. Okt. 2022, 16:52:
I now changed it, so that the search functions actually use the unified data. Additionally I triaged a bug in the new XML processing, which meant that only 1/30 texts were found in the manuscript descriptions... but we'll have to have a very close look at the xml data extraction. I'm under the impression that considerable regressions have happened there.
— Reply to this email directly, view it on GitHub https://github.com/arbeitsgruppe-digitale-altnordistik/Sammlung-Toole/pull/121#issuecomment-1264661739, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZXABDMPDNCB7BMR5BGSBLWBGOSJANCNFSM6AAAAAAQYECEJM . You are receiving this because you were mentioned.Message ID: <arbeitsgruppe-digitale-altnordistik/Sammlung-Toole/pull/121/c1264661739@ github.com>
Regressions in the xml extraction as in the new way we implemented lxml instead of beautiful soup? If so, I'll have a look
yes... I can't pip point it yet... but I think we're skipping extremely many searches (sometimes with, sometimes without logging a warning) and accept 0 or "N/A" where previously we got real results.
The instance I actually found, was that due to some logic being placed in a wrong if-statement, we only found ca. 2k texts in manuscripts, instead of 60k, which resulted in a tiny junction table and thus not finding any results for most "search text by manuscript" searches. But that one is fixed.
But I don't think this is the highest priority right now.
resolves #38