caikit / caikit-nlp

Apache License 2.0
12 stars 45 forks source link

:sparkles: Add tokenization task to generation modules #351

Closed evaline-ju closed 4 months ago

evaline-ju commented 4 months ago

Closes https://github.com/caikit/caikit-nlp/issues/350

Because TextGenerationTGIS is not a subclass but a backend type for TextGeneration, any additional tasks declared on the former were not actually getting added. By adding the additional tasks on TextGeneration and PeftPromptTuning respectively, this allows the tokenization tasks to be available on the TGIS backend implementations. Unimplemented functions have to be added or there will be errors.

Tokenization run functions could eventually be implemented since each LLM could be reasonably expected to have a tokenizer.

Std getting imported with LaunchConfig started to error in latest torch 2.3.0. It was observed the Std object is at https://github.com/pytorch/pytorch/blame/main/torch/distributed/elastic/multiprocessing/api.py and not in the LaunchConfig and was potentially getting imported through another import to the launcher api. tee is no longer an arg on LaunchConfig. Since this was a breaking change, torch is now pinned below 2.3.0.