dotnet / ai-samples

MIT License
415 stars 79 forks source link

Update Phi tokenizer implementation to use Microsoft.ML.Tokenizers #48

Open luisquintanilla opened 7 months ago

luisquintanilla commented 7 months ago

Work on CodeGen tokenizer needed for Phi-2 will be complete in Microsoft.ML.Tokenizers in the next few days.

Once it's available, consider replacing the current tokenizer implementation with the one provided by Microsoft.ML.Tokenizers.

https://github.com/dotnet/ai-samples/blob/main/src/local-models/Phi/Tokenizer.cs

luisquintanilla commented 6 months ago

For the Phi-3 sample, use the existing LlamaTokenizer in Microsoft.ML.Tokenizers