Open joshdevins opened 9 months ago
The following is a list of models that we wish to verify compatibility with, per-task type. The list is based off of the base models and tokenizers that we support, and the tasks we support.
fill_mask
ner
text_expansion
text_classification
text_embedding
zero_shot_classification
question_answering
text_similarity
Today we rely mostly on unit testing for the PyTorch/NLP model import testing. We perform large scale testing as part of other components like Elasticsearch, but we often find bugs later only and can't tie them to specific changes in eland (e.g. to a specific PR). We'd like to improve integration testing in eland by performing a test matrix of models+multiple Elasticsearch versions. For each model, we'd test multiple inputs of various lengths, up to and beyond each model's input limit, and validate the inference results from Elasticsearch directly against results from
transformers
as ground truth. Tests should run as part of the normal CI cycle and need to pass before a PR can be merged.More details to follow in this issue such as the list of models to test.