NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

PAV backend #193

Closed osma closed 5 years ago

osma commented 5 years ago

According to the Annif-fusion experiments, using PAV a.k.a. isotonic regression could provide a significant improvement in analysis quality. We should add a new backend (called pav) that could be configured like this:

[yso-pav-fi]
name=YSO PAV ensemble Finnish
language=fi
backends=pav
sources=tfidf-fi,fasttext-fi,maui-fi
min-docs=3
fallback=raw
vocab=yso-fi

The parameter min-docs would specify the minimum number of documents about a subject for creating a PAV model, and the fallback parameter would define the strategy used when no PAV model exists. The values are either raw (use raw score) or zero (use zero score).

osma commented 5 years ago

The PAV backend would be trained separately using the train CLI command.