NatLibFi / Annif-tutorial

Instructions, exercises and example data sets for Annif hands-on tutorial
Creative Commons Attribution 4.0 International
36 stars 9 forks source link

downloading and using pretrained API models locally? #12

Closed kauttoj closed 3 years ago

kauttoj commented 3 years ago

This might be a stupid question, but how do I use the pretrained models available via API locally (either via docker or Python)? It seems that in all tutorials and instructions here the main assumption is that models are trained from scratch. I would just like to download the latest models you have available via your API and use them locally. I can use your online API (e.g., via Python), but I'd rather use faster offline solution as I have tens of thousands of documents to process. For example, see the docker for Turku NLP neural parser (http://turkunlp.org/Turku-neural-parser-pipeline/docker.html): Just few simple steps and you can use pretained models with your own texts. I was looking for something similar for Annif.

juhoinkinen commented 3 years ago

At the moment the pretrained models of Finto AI and api.annif.org are not publicly downloadable.

We have thought that the Annif/automatic indexing users would fall under two categories, those who just use it as a service via APIs/webform of Finto AI, and those who will have an own Annif installation for building their own models on their own vocabularies (this tutorial is targeted to these users).

I think we could make the models downloadable, but note that the current models used in Finto AI take 18 GBs of disk/RAM space (the Finnish models take 8 GBs; the Finnish fastText model is a bit bloated, it takes 6.1 GBs, we'll try to reduce this in the future). Also, in our evaluation runs the Finto AI API has been able to process roughly 1000 documents (theses) in 1 hour. This is when queries originated from the same network. If the speed is not much worse from your network, maybe using the API would still be an easier approach?

kauttoj commented 3 years ago

Ok, thanks for the information. I'm using API for now. It can process one (short) document in ~1.4s on average, which is ok. I'm not in hurry. If you can publish your models for download in near future, it would be great. At the age of huge deep neural networks, ~10-20GB for a model is not actually that bad :)

juhoinkinen commented 3 years ago

Hi @kauttoj, in case you still would like to download the Finto AI models, you now find them at https://annif.org/download/models/. There are both the current models (finto-ai-2021-04/ using YSO 2021.3.Epikuros) and ones from the previous update-round (finto-ai-2020-12/ using YSO snapshot from 2020-10-15).