caikit / caikit-nlp

Apache License 2.0
12 stars 45 forks source link

added tokenization tasks to serve tokenization requests #325

Closed swith004 closed 6 months ago

swith004 commented 6 months ago

Added tokenization tasks in order to serve tokenization request and return the token count

swith004 commented 6 months ago

one tiny nit but otherwise lgtm! You'll need a duplicate PR to release-0.3 as well for now

@tharapalanivel should I make the duplicate PR for the release-0.3 before or after these changes get merged?

swith004 commented 6 months ago

Thanks for the contribution! A couple asks:

* A few small unit tests would be helpful for the new module function. We should strive for tests on any new functionality. A section like for the streaming tests https://github.com/caikit/caikit-nlp/blob/main/tests/modules/text_generation/test_text_generation_tgis.py#L135-L161 on each module might be useful to reference.

* We will likely want to allow for the tokenization to also run on prompt tuned models so it'd be good to have the same update here: https://github.com/caikit/caikit-nlp/blob/main/caikit_nlp/modules/text_generation/peft_tgis_remote.py

* [After all updates] For releasing we will want this duplicated to the `release-0.3` branch in order to also release this functionality without embeddings.

@evaline-ju - merging is still block despite Thara's approval for the changes you requested. Not sure how to proceed