dluc / openai-tools

A collection of tools for working with OpenAI
Creative Commons Zero v1.0 Universal
95 stars 14 forks source link

decode method #3

Open Aya-S opened 1 year ago

Aya-S commented 1 year ago

Can't find documentation for decode method in the cs class.is decode not supported?

dluc commented 1 year ago

hi @Aya-S, you mean a method to go from a list of token IDs to a string? could you elaborate about the scenario where this could be useful?

Some tokens don't have an entry in the tokenizer vocabulary, so the process is not completely reversible.

LassoMike commented 1 year ago

A good use cade for a Decode() method would be a TokenTextSplitter() method. It seems to be reversible because other libraries have working decode methods such as: https://github.com/hyunwoongko/gpt2-tokenizer-java/blob/master/src/main/java/ai/tunib/tokenizer/GPT2Tokenizer.java.