exasol / transformers-extension

An Exasol extension for using state-of-the-art pretrained machine learning models via the Hugging Face Transformers API.
MIT License
2 stars 2 forks source link

Add spans to relevant models #255

Open MarleneKress79789 opened 1 month ago

MarleneKress79789 commented 1 month ago

Summary

We will use the transformer-extension for Named Entity Recognition, Topic Classification and Sentiment Classification in ExasolAI. We will store the documents and their parts as spans. To be able to join back the annotations produced by the transformer-extensions to the documents and their parts, we need input and output spans into/from the Prediction UDFs.

Features

Tasks

### Token Classification
- [x] add spans as optional input to TokenClassification UDF
- [x] add spans as output to TokenClassification UDF
- [x] add additional udf call for token classification with spans, which only gets installed if featureflag is set during udf-installation
- [x] add tests for TokenClassification utilising spans
### Sequeunce Classification(input span = output span)
- [ ] add spans as optional input to SequeunceClassification UDF
- [ ] add spans as output to SequeunceClassification UDF
- [ ] add additional udf call for SequeunceClassification with spans, which only gets installed if feature flag is set during udf-installation
- [ ] find model for  SequeunceClassification which works with spans
- [ ] add tests for SequeunceClassification utilising spans
### Zero Shot
- [ ] add spans as optional input to ZeroShot UDF
- [ ] add spans as output to ZeroShot UDF
- [ ] add additional udf call for ZeroShot with spans, which only gets installed if feature flag is set during udf-installation
- [ ] find model for  ZeroShot which works with spans
- [ ] add tests for ZeroShot utilising spans
### Text Generation(input span = output span)
- [ ] add spans as optional input to TextGeneration UDF
- [ ] add spans as output to TextGeneration UDF
- [ ] add additional udf call for TextGeneration with spans, which only gets installed if feature flag is set during udf-installation
- [ ] find model for  TextGeneration which works with spans
- [ ] add tests for TextGeneration utilising spans
### Tasks
- [ ] update docu
redcatbear commented 1 week ago

PR for token classification has open design decisions. Consulting with @tkilias.