cboulanger / excite-docker

Docker image with tools for the annotation of ML training docs for reference extraction based on the EXparser tools
https://cboulanger.github.io/excite-docker
GNU General Public License v3.0
0 stars 0 forks source link

Allow switching of models with optional remote model repository #4

Closed cboulanger closed 2 years ago

cboulanger commented 2 years ago

In order to be able to use specialized models for different kind of scholarly citation patterns, we should make the directory containing model data (now EXparser/Utils) configurable. The idea is to give such a specialized model a unique name which serves both as an well-known id and the name of the directory in which the models are stored. Since the model data is directly dependent on the training code, it needs to be versioned. This also allows to run tests comparing the performance of a particular model with the same id but different versions (for example, by running an evaluation comparing performance of different git branches).

When we have this system in place, an optional storage system can be build upon it. It works with packages that are a ZIP of the training material and model data stored in a configurable location.

cboulanger commented 2 years ago

Alternatively, instead of allowing to use non-existent model ids, and implicitly creating new dirs, a separate command create_model could be used that explicitly creates a new directory. Probably better to raise errors if non-existent ids are used.

cboulanger commented 2 years ago

Rewrote the proposal according to my last comment to not do any implicit magic. Instead, creation and downloading needs to be done explicitly and errors should be thrown if model ids do not exist.

cboulanger commented 2 years ago

Done in https://github.com/cboulanger/excite-docker/tree/add_model_storage