OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University
https://txsun1997.github.io/blogs/moss.html
Apache License 2.0
11.9k stars 1.14k forks source link

Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers #301

Open jacklanda opened 1 year ago

jacklanda commented 1 year ago

According to the exception message, it seems this issue possibly is caused by the missing implementation of Rust-based tokenization, while calling tokenize() method to tokenize batch of sequences and access the result of batch_tokenized with slice(not str like input_ids or attention_mask, etc.) . Does anyone could help with this? Thanks!