tokenizer Search Results

1000+ results
for tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

FoundationVision/LlamaGen #15

VQ-VAE ckpt optimizer states?

Hello! Thank you for the clean + user friendly codebase! I'm trying to finetune the VQ-VAE tokenizer and noticed some keys might be missing from the pretrained checkpoint listed on huggingface: `"o…

julian-q updated 3 weeks ago
3
function2-llx/PUMIT #8

Need guidance on using pretrained tokenizer

Hi! I am trying to use the pretrained tokenizer to obtain latent code for my input CT images. However, I didn't see the identity-mapping-like reconstruction as demonstrated in Figure 3 of your pa…

Masaaki-75 updated 1 week ago
3
huggingface/transformers #31643

AutoTokenizer: Phi-3 drops spaces when decodes a token at a …

### System Info - `transformers` version: 4.41.2 - Platform: macOS-14.5-x86_64-i386-64bit - Python version: 3.11.6 - Huggingface_hub version: 0.23.4 - Safetensors version: 0.4.3 - Accelerate ver…

Andrei-Aksionov updated 6 days ago
2
apache/doris #35411

[Feature] support IK tokenizer for inverted index

### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Description [IK](https://github.com/infin…

xiaokang updated 1 month ago
2
huggingface/alignment-handbook #47

Tokenizer model_max_length

Hello, I was seeing warning during finetuning Mistral and tracked this line here https://github.com/huggingface/alignment-handbook/blob/main/src/alignment/model_utils.py#L71 Because Mistral's…

binarycrayon updated 1 month ago
6
stanford-crfm/levanter #614

Detect use of an invalid tokenizer cache

Originally reported as a training NaN for LoRA, this was due to an invalid tokenizer cache. We should detect this by: 1. Store metadata in the cache when it’s built and check when it’s loaded 2.…

rjpower updated 1 month ago
6
huggingface/tokenizers #1552

[Bug?] Modifying normalizer for pretrained tokenizers don't …

I'm not sure if it's a bug/feature, sometimes modifying the normalizer of a pretrained tokenizer works but sometimes it doesn't. For example, it works for `"mistralai/Mistral-7B-v0.1"` but not `"m…

alvations updated 3 weeks ago
1
MilaNLProc/simple-generation #4

Truncation by tokenizer not working correctly

Hello, Truncation of the `input_ids` during tokenization, .i.e., [line 336](https://github.com/MilaNLProc/simple-generation/blob/73b760c60b76509390d286d4785fceeaa7d7fe8d/simple_generation/simple_ge…

lorelupo updated 3 days ago
2
Sunojlab/Transfer_Learning_in_Catalysis #1

Issue with Tokenizer

Hi, I am trying to reproduce this transfer learning code, however I am getting an error. I think this is because of the fastai version. When I am running `tok=Tokenizer(partial(MolTokenizer, special_t…

arupmondal835 updated 3 months ago
1
deepseek-ai/DeepSeek-Coder-V2 #25

mismatch between example code and model files

I found in this repo and the huggingface model card there is a line: ``` # tokenizer.eos_token_id is the id of token ``` But in tokenizer_config.py inside model repo the `eos_token` is set to b…

yucc-leon updated 5 days ago
1

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for tokenizer

1000+ results
for tokenizer