NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Suppress duplicate log messages from subject module #673

Closed juhoinkinen closed 1 year ago

juhoinkinen commented 1 year ago

When a project's vocabulary is not up-to-date with the corpus being used there can be very many warnings like warning: Unknown subject URI <http://www.yso.fi/onto/yso/p22036> that flood the screen.

See an example output annif train tfidf-en ../Annif-tutorial/data-sets/yso-nlf/docs/train/ Backend tfidf: transforming subject corpus Backend tfidf: creating vectorizer warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI Backend tfidf: creating similarity index

However, usually there are not so many URIs (or labels) that are absent from the vocabulary, but the message is shown for every occasion any of them is encounted in the corpus: the same warnings are duplicated many times. This PR suppresses the duplicate log messages that are raised from the subject.py module.

See the above output with this PR applied annif train tfidf-en ../Annif-tutorial/data-sets/yso-nlf/docs/train/ Backend tfidf: transforming subject corpus Backend tfidf: creating vectorizer warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI warning: Unknown subject URI Backend tfidf: creating similarity index

Based on a SO answer.

codecov[bot] commented 1 year ago

Codecov Report

Base: 99.56% // Head: 99.56% // Increases project coverage by +0.00% :tada:

Coverage data is based on head (8d7e1c5) compared to base (8a194c4). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #673 +/- ## ======================================= Coverage 99.56% 99.56% ======================================= Files 87 87 Lines 6145 6158 +13 ======================================= + Hits 6118 6131 +13 Misses 27 27 ``` | [Impacted Files](https://codecov.io/gh/NatLibFi/Annif/pull/673?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi) | Coverage Δ | | |---|---|---| | [annif/corpus/subject.py](https://codecov.io/gh/NatLibFi/Annif/pull/673?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvY29ycHVzL3N1YmplY3QucHk=) | `100.00% <100.00%> (ø)` | | | [annif/util.py](https://codecov.io/gh/NatLibFi/Annif/pull/673?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvdXRpbC5weQ==) | `98.57% <100.00%> (+0.26%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sonarcloud[bot] commented 1 year ago

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 1 Security Hotspot
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication