The code in this PR is by @mo-fu and originally submitted via PR https://github.com/NatLibFi/Annif/pull/540. That PR got accidentally closed and could not be re-opened, which is why this new PR needs to be opened for the XTransformer backend. (This PR is coming from the point of the git history just before the unsuccessful commits attempting to make the original PR re-openable.)
The description of the original PR is below.
This PR adds XTransformer as an optional backend to Annif. For now it does not yet use distilbert in the default configuration as this is not yet available on pypi.
The tests for the backend resort to mocking as training would download a pretrained model of size at least 500 mb.
Also we should discuss cache directories. At the moment xtransformer will download models from the huggingface hub to ~/.cache/huggingface Is this behavior desired for Annif or should the cache be placed in the data folder?
I also haven't modified the docker container yet. When I installed pecos in a venv it required BLAS libraries so this would probably have to be added to the container. Additionally pecos will install the GPU enabled pytorch. Meaning the container size will grow. Therefore I wanted to check with you first before adding it.
The code in this PR is by @mo-fu and originally submitted via PR https://github.com/NatLibFi/Annif/pull/540. That PR got accidentally closed and could not be re-opened, which is why this new PR needs to be opened for the XTransformer backend. (This PR is coming from the point of the git history just before the unsuccessful commits attempting to make the original PR re-openable.)
The description of the original PR is below.
This PR adds XTransformer as an optional backend to Annif. For now it does not yet use distilbert in the default configuration as this is not yet available on pypi.
The tests for the backend resort to mocking as training would download a pretrained model of size at least 500 mb. Also we should discuss cache directories. At the moment xtransformer will download models from the huggingface hub to
~/.cache/huggingface
Is this behavior desired for Annif or should the cache be placed in the data folder?I also haven't modified the docker container yet. When I installed pecos in a venv it required BLAS libraries so this would probably have to be added to the container. Additionally pecos will install the GPU enabled pytorch. Meaning the container size will grow. Therefore I wanted to check with you first before adding it.