I think it would be nice to have a small utility data structure to fetch pretrained embeddings. I don't think this needs to be part of the finalfusion crate, since it is not really core functionality. The basic idea is:
A small crate (possibly in the same repo), would provide a datastructure Fetcher With a constructor that retrieves the metadata and gives a fetcher:
let fetcher = Fetcher::fetch_metadata().unwrap();
A user could then open embeddings:
let dutch_embeddings = fetcher.open("fasttext.wiki.nl.fifu").unwrap();
This method would check if the embeddings are already available. If not, fetch them, store them in a standard XDG location. Then it would open the embeddings stored in this location.
Similarly, Fetcher::mmap could be used to memory-map an embedding after downloading.
After this is implemented, the functionality could also be exposed in finalfusion-python.
I think it would be nice to have a small utility data structure to fetch pretrained embeddings. I don't think this needs to be part of the
finalfusion
crate, since it is not really core functionality. The basic idea is:finalfusion-fetcher
with some metadata file (probably JSON), mapping embedding file identifiers to URLs. E.g.fasttext.wiki.nl.fifu
could map to http://www.sfs.uni-tuebingen.de/a3-public-data/finalfusion-fasttext/wiki/wiki.nl.fifuA small crate (possibly in the same repo), would provide a datastructure
Fetcher
With a constructor that retrieves the metadata and gives a fetcher:A user could then open embeddings:
This method would check if the embeddings are already available. If not, fetch them, store them in a standard XDG location. Then it would open the embeddings stored in this location.
Similarly,
Fetcher::mmap
could be used to memory-map an embedding after downloading.After this is implemented, the functionality could also be exposed in
finalfusion-python
.