NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
191 stars 41 forks source link

Avoid scikit-learn UserWarning for vectorizer parameter token_pattern #729

Closed osma closed 11 months ago

osma commented 11 months ago

scikit-learn vectorizers used by Annif (CountVectorizer, TfidfVectorizer) trigger this warning:

UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None'

This is a bit surprising, since we are not setting token_pattern ourselves, but its default value is not None. This PR fixes the warning by explicitly setting token_pattern=None whenever the tokenizer parameter is set in Annif calling code.

codecov[bot] commented 11 months ago

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (40cc2fd) 99.67% compared to head (a84e466) 99.67%. Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #729 +/- ## ======================================= Coverage 99.67% 99.67% ======================================= Files 89 89 Lines 6397 6401 +4 ======================================= + Hits 6376 6380 +4 Misses 21 21 ``` | [Files Changed](https://app.codecov.io/gh/NatLibFi/Annif/pull/729?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi) | Coverage Δ | | |---|---|---| | [annif/lexical/mllm.py](https://app.codecov.io/gh/NatLibFi/Annif/pull/729?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvbGV4aWNhbC9tbGxtLnB5) | `100.00% <ø> (ø)` | | | [annif/backend/mixins.py](https://app.codecov.io/gh/NatLibFi/Annif/pull/729?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvYmFja2VuZC9taXhpbnMucHk=) | `97.82% <100.00%> (+0.20%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sonarcloud[bot] commented 11 months ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication