elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
640 stars 98 forks source link

Fix failed import of Sentence Transformer RoBERTa models #637

Closed davidkyle closed 10 months ago

davidkyle commented 10 months ago

Uploading the sentence-transformers/all-distilroberta-v1 model to Elasticsearch failed with the following

TypeError: _SentenceTransformerWrapper.forward() missing 2 required positional arguments: 'token_type_ids' and 'position_ids'

The error occurs when the eland_import_hub_model script evaluates the model to measure the size of the output embedding. It can be reproduced with the following command:

eland_import_hub_model --url <elastic> --hub-model-id sentence-transformers/all-distilroberta-v1 --task-type text_embedding

The problem is due to the model's tokenizer type not being recognised due to a simple typo. A test has been added to cover this case.