tokenization Search Results

1000+ results
for tokenization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mlfoundations/open_clip #403

Improve tokenizer decode

Right now the tokenizer decode method supports only a single instance at a time. I think it would be good to have `batch_decode` function and also support `skip_special_tokens` and `clean_up_tokenizat…

vturrisi updated 1 year ago
2
yao8839836/kg-llm #3

IndexError: piece id is out of range

Hi, thank you very much for your excellent work ! I try to run the code using ChatGLM2, but the following error occurs: Loading checkpoint shards: 100%|████████████████████████████████████████████…

Honourwei updated 10 months ago
3
jyhong836/llm-dp-finetune #1

About Llama-3 Fine-tune

![image](https://github.com/user-attachments/assets/26e66761-d3fa-4279-a97c-54a28834279a) * LLama-3 fine-tune problem: When I use the Llama-3 configs, it encounters the load error while loading …

Zuo-Lihan updated 2 months ago
2
we-like-parsers/cpython #157

f-string parser: Check comments in f-strings

In `test_fstring` the test `test_comments` fails. We need to check if is due to regular tokenization problems and the test needs updating or we need to fix something regarding comments in f-strings

pablogsal updated 1 year ago
9
ndif-team/nnsight #211

Unrecognized configuration class when using nnsight to load …

I am trying to use nnsight to load new LLMs, such as Qwen/Qwen2-VL-7B-Instruct. qwen2_vl_model = LanguageModel("Qwen/Qwen2-VL-7B-Instruct", device_map="auto", dispatch=True) The nnsight repor…

ruizheliUOA updated 2 months ago
4
commonmark/commonmark-spec #773

Non-alphanumeric with format is not properly parsed when con…

String with non-alphanumeric formatted content which has a next-char of an alpha-numeric is tokenized as `text` node, instead of into a series of format nodes as expected. Problem reproduced on [Co…

tomerlichtash updated 3 months ago
1
globalpayments/python-sdk #5

No response for CVV & AVS verification through global paymen…

Hi, My software verification process got failed due to not requesting for AVS and CVV verification. The review team explained the app is not requesting for CVV and AVS verification. Actually I des…

storagerepo updated 1 year ago
5
Hongyi6328/pe #2

If there is more than one whitespace between command words, …

![image.png](https://raw.githubusercontent.com/Hongyi6328/pe/main/files/cab7d032-3b69-4dad-8595-38e4eac4ced0.png) How to reproduce: 1. `tutorial add g/T0.1` The tokenization does not work properly …

Hongyi6328 updated 2 years ago
1
Ljferrer/Ghost #4

Dataset Datasheet

- [x] Create README.md for dataset - [x] Data Acquisition - [ ] Data Cleaning - [ ] Tokenization - [ ] Artist Statistics - [ ] n songs - [ ] dates released - [ ] vocabulary sizes …

Ljferrer updated 5 years ago
1
rominf/profanity-filter #9

Parallelize censoring

I think `dask` is a good solution because it has a nice API and can be used in a cluster. The easiest and most effective parallelization is to map words after tokenization.

rominf updated 5 years ago
2

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for tokenization

1000+ results
for tokenization