-
Hello! Thank you for the clean + user friendly codebase!
I'm trying to finetune the VQ-VAE tokenizer and noticed some keys might be missing from the pretrained checkpoint listed on huggingface: `"o…
-
Hi! I am trying to use the pretrained tokenizer to obtain latent code for my input CT images.
However, I didn't see the identity-mapping-like reconstruction as demonstrated in Figure 3 of your pa…
-
### System Info
- `transformers` version: 4.41.2
- Platform: macOS-14.5-x86_64-i386-64bit
- Python version: 3.11.6
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate ver…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
### Description
[IK](https://github.com/infin…
-
Hello,
I was seeing warning during finetuning Mistral and tracked this line here
https://github.com/huggingface/alignment-handbook/blob/main/src/alignment/model_utils.py#L71
Because Mistral's…
-
Originally reported as a training NaN for LoRA, this was due to an invalid tokenizer cache. We should detect this by:
1. Store metadata in the cache when it’s built and check when it’s loaded
2.…
-
I'm not sure if it's a bug/feature, sometimes modifying the normalizer of a pretrained tokenizer works but sometimes it doesn't.
For example, it works for `"mistralai/Mistral-7B-v0.1"` but not `"m…
-
Hello,
Truncation of the `input_ids` during tokenization, .i.e., [line 336](https://github.com/MilaNLProc/simple-generation/blob/73b760c60b76509390d286d4785fceeaa7d7fe8d/simple_generation/simple_ge…
-
Hi, I am trying to reproduce this transfer learning code, however I am getting an error. I think this is because of the fastai version. When I am running `tok=Tokenizer(partial(MolTokenizer, special_t…
-
I found in this repo and the huggingface model card there is a line:
```
# tokenizer.eos_token_id is the id of token
```
But in tokenizer_config.py inside model repo the `eos_token` is set to b…