Closed dqbd closed 1 year ago
This PR implements the following features:
Creating custom encoders
import { readFileSync } from "fs"; const encoder = new Tiktoken( readFileSync("./ranks/gpt2.tiktoken").toString("utf-8"), { "<|endoftext|>": 50256, "<|im_start|>": 100264, "<|im_end|>": 100265 }, "'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)|\\s+" );
Extending existing encoders with additional special tokens
const encoder = encoding_for_model("gpt2", { "<|im_start|>": 100264, "<|im_end|>": 100265, })
Closes #1
This PR implements the following features:
Creating custom encoders
Extending existing encoders with additional special tokens
Closes #1