Update dependency transformers to v4.36.0 [SECURITY]

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
transformers	`==4.30.0` -> `==4.36.0`

[!WARNING] Some dependencies could not be looked up. Check the warning logs for more information.

GitHub Vulnerability Alerts

CVE-2023-7018

Deserialization of Untrusted Data in GitHub repository huggingface/transformers prior to 4.36.

CVE-2023-6730

Deserialization of Untrusted Data in GitHub repository huggingface/transformers prior to 4.36.0.

Release Notes

huggingface/transformers (transformers)

### [`v4.36.0`](https://togithub.com/huggingface/transformers/releases/tag/v4.36.0): v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support [Compare Source](https://togithub.com/huggingface/transformers/compare/v4.35.2...v4.36.0) #### New model additions ##### Mixtral Mixtral is the new open-source model from Mistral AI announced by the blogpost [Mixtral of Experts](https://mistral.ai/news/mixtral-of-experts/). The model has been proven to have comparable capabilities to Chat-GPT according to the benchmark results shared on the release blogpost.

The architecture is a sparse Mixture of Experts with Top-2 routing strategy, similar as `NllbMoe` architecture in transformers. You can use it through `AutoModelForCausalLM` interface: ```py >>> import torch >>> from transformers import AutoModelForCausalLM, AutoTokenizer >>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B", torch_dtype=torch.float16, device_map="auto") >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-8x7B") >>> prompt = "My favourite condiment is" >>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device) >>> model.to(device) >>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True) >>> tokenizer.batch_decode(generated_ids)[0] ``` The model is compatible with existing optimisation tools such Flash Attention 2, `bitsandbytes` and PEFT library. The checkpoints are release under [`mistralai`](https://huggingface.co/mistralai) organisation on the Hugging Face Hub. ##### Llava / BakLlava Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. In other words, it is an multi-modal version of LLMs fine-tuned for chat / instructions.

The Llava model was proposed in [Improved Baselines with Visual Instruction Tuning](https://arxiv.org/pdf/2310.03744) by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. - \[`Llava`] Add Llava to transformers by [@younesbelkada](https://togithub.com/younesbelkada) in [#27662](https://togithub.com/huggingface/transformers/issues/27662) - \[LLaVa] Some improvements by [@NielsRogge](https://togithub.com/NielsRogge) in [#27895](https://togithub.com/huggingface/transformers/issues/27895) The integration also includes [`BakLlava`](https://togithub.com/SkunkworksAI/BakLLaVA) which is a Llava model trained with Mistral backbone. The mode is compatible with `"image-to-text"` pipeline: ```py from transformers import pipeline from PIL import Image import requests model_id = "llava-hf/llava-1.5-7b-hf" pipe = pipeline("image-to-text", model=model_id) url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg" image = Image.open(requests.get(url, stream=True).raw) prompt = "USER: \nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nASSISTANT:" outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200}) print(outputs) ``` And you can find all Llava weights under [`llava-hf`](https://huggingface.co/llava-hf) organisation on the Hub. ##### SeamlessM4T v2 SeamlessM4T-v2 is a collection of models designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text. It is an improvement on the [previous version](https://huggingface.co/docs/transformers/v4.36.0/en/model_doc/seamless_m4t.md) and was proposed in [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team from Meta AI. For more details on the differences between v1 and v2, refer to section [Difference with SeamlessM4T-v1](https://huggingface.co/docs/transformers/v4.36.0/en/model_doc/seamless_m4t_v2#difference-with-seamlessm4t-v1). SeamlessM4T enables multiple tasks without relying on separate models: - Speech-to-speech translation (S2ST) - Speech-to-text translation (S2TT) - Text-to-speech translation (T2ST) - Text-to-text translation (T2TT) - Automatic speech recognition (ASR) - Add SeamlessM4T v2 by [@ylacombe](https://togithub.com/ylacombe) in [#27779](https://togithub.com/huggingface/transformers/issues/27779) ##### PatchTST The PatchTST model was proposed in [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/abs/2211.14730) by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong and Jayant Kalagnanam. At a high level, the model vectorizes time series into patches of a given size and encodes the resulting sequence of vectors via a Transformer that then outputs the prediction length forecast via an appropriate head. The model is illustrated in the following figure: ![patchtst](https://togithub.com/huggingface/transformers/assets/8100/37f8d4a9-bcb8-41a9-9518-a34c11874ff6) - \[Time series] Add PatchTST by [@psinthong](https://togithub.com/psinthong) in [#25927](https://togithub.com/huggingface/transformers/issues/25927) - \[Time series] Add PatchTST by [@kashif](https://togithub.com/kashif) in [#27581](https://togithub.com/huggingface/transformers/issues/27581) ##### PatchTSMixer The PatchTSMixer model was proposed in [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf) by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong and Jayant Kalagnanam. PatchTSMixer is a lightweight time-series modeling approach based on the MLP-Mixer architecture. In this HuggingFace implementation, we provide PatchTSMixer’s capabilities to effortlessly facilitate lightweight mixing across patches, channels, and hidden features for effective multivariate time-series modeling. It also supports various attention mechanisms starting from simple gated attention to more complex self-attention blocks that can be customized accordingly. The model can be pretrained and subsequently used for various downstream tasks such as forecasting, classification and regression. - \[Time series] Add PatchTSMixer by [@ajati](https://togithub.com/ajati) in [#26247](https://togithub.com/huggingface/transformers/issues/26247) ##### CLVP The CLVP (Contrastive Language-Voice Pretrained Transformer) model was proposed in [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. - Add CLVP by [@susnato](https://togithub.com/susnato) in [#24745](https://togithub.com/huggingface/transformers/issues/24745) ##### Phi-1/1.5 The Phi-1 model was proposed in [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li. The Phi-1.5 model was proposed in [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. - Add Phi-1 and Phi-1\_5 by [@susnato](https://togithub.com/susnato) in [#26170](https://togithub.com/huggingface/transformers/issues/26170) ##### TVP The text-visual prompting (TVP) framework was proposed in the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding. This research addresses temporal video grounding (TVG), which is the process of pinpointing the start and end times of specific events in a long video, as described by a text sentence. Text-visual prompting (TVP), is proposed to enhance TVG. TVP involves integrating specially designed patterns, known as ‘prompts’, into both the visual (image-based) and textual (word-based) input components of a TVG model. These prompts provide additional spatial-temporal context, improving the model’s ability to accurately determine event timings in the video. The approach employs 2D visual inputs in place of 3D ones. Although 3D inputs offer more spatial-temporal detail, they are also more time-consuming to process. The use of 2D inputs with the prompting method aims to provide similar levels of context and accuracy more efficiently. - TVP model by [@jiqing-feng](https://togithub.com/jiqing-feng) in [#25856](https://togithub.com/huggingface/transformers/issues/25856) ##### DINOv2 depth estimation Depth estimation is added to the DINO v2 implementation. - Add DINOv2 depth estimation by [@NielsRogge](https://togithub.com/NielsRogge) in [#26092](https://togithub.com/huggingface/transformers/issues/26092) #### ROCm support for AMD GPUs AMD's ROCm GPU architecture is [now supported across the board](https://huggingface.co/blog/huggingface-and-optimum-amd) and fully tested in our CI with MI210/MI250 GPUs. We further enable specific hardware acceleration for ROCm in Transformers, such as [Flash Attention 2](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2), [GPTQ quantization](https://huggingface.co/docs/transformers/quantization#autogptq) and DeepSpeed. - Add RoCm scheduled CI & upgrade RoCm CI to PyTorch 2.1 by [@fxmarty](https://togithub.com/fxmarty) in [#26940](https://togithub.com/huggingface/transformers/issues/26940) - Flash Attention 2 support for RoCm by [@fxmarty](https://togithub.com/fxmarty) in [#27611](https://togithub.com/huggingface/transformers/issues/27611) - Reflect RoCm support in the documentation by [@fxmarty](https://togithub.com/fxmarty) in [#27636](https://togithub.com/huggingface/transformers/issues/27636) - restructure AMD scheduled CI by [@ydshieh](https://togithub.com/ydshieh) in [#27743](https://togithub.com/huggingface/transformers/issues/27743) #### PyTorch `scaled_dot_product_attention` native support PyTorch's [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html) operator is now supported [in the most-used Transformers models](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention) and **used by default when using `torch>=2.1.1`**, allowing to dispatch on [memory-efficient attention and Flash Attention](https://pytorch.org/blog/accelerating-large-language-models/) backend implementations with no other package than `torch` required. This should [significantly speed up](https://pytorch.org/blog/out-of-the-box-acceleration/) attention computation on hardware that that supports these fastpath. While Transformers automatically handles the dispatch to use SDPA when available, it is possible to force the usage of a given attention implementation (`"eager"` being the manual implementation, where each operation is implemented [step by step](https://togithub.com/huggingface/transformers/blob/9f18cc6df0b7e0d50f78b9e9fcb3aafa7b5160fe/src/transformers/models/llama/modeling_llama.py#L413-L431)): ```python ### or `attn_implementation="sdpa", or `attn_implementation="flash_attention_2"` model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-tiny", attn_implementation="eager") ``` **[Training benchmark](https://gist.github.com/fxmarty/7e75cc3942d6974e4849093ebea0a331), run on A100-SXM4-80GB.** | Model | Batch size | Sequence length | Time per batch (`"eager"`, s) | Time per batch (`"sdpa"`, s) | **Speedup** | Peak memory (`"eager"`, MB) | Peak memory (`"sdpa"`, MB) | **Memory savings** | |-----------|------------|-----------------|-------------------------------|------------------------------|-------------|-----------------------------|----------------------------|-----------------------| | llama2 7b | 4 | 1024 | 1.065 | 0.90 | **19.4%** | 73878.28 | 45977.81 | **60.7%** | | llama2 7b | 4 | 2048 | OOM | 1.87 | / | OOM | 78394.58 | **SDPA does not OOM** | | llama2 7b | 1 | 2048 | 0.64 | 0.48 | **32.0%** | 55557.01 | 29795.63 | **86.4%** | | llama2 7b | 1 | 3072 | OOM | 0.75 | / | OOM | 37916.08 | **SDPA does not OOM** | | llama2 7b | 1 | 4096 | OOM | 1.03 | / | OOM | 46028.14 | **SDPA does not OOM** | | llama2 7b | 2 | 4096 | OOM | 2.05 | / | OOM | 78428.14 | **SDPA does not OOM** | **[Inference benchmark](https://gist.github.com/fxmarty/5113e4304fbdd38c9c3702ce44683f6a), run on A100-SXM4-80GB.** | Model | Batch size | Prompt length | Num new tokens | Per token latency `"eager"` (ms) | Per token latency `"sdpa"` (ms) | **Speedup** | |------------------|------------|---------------|----------------|----------------------------------|---------------------------------|-------------| | llama2 13b | 1 | 1024 | 1 (prefill) | 178.66 | 159.36 | **12.11%** | | llama2 13b | 1 | 100 | 100 | 40.35 | 37.62 | **7.28%** | | llama2 13b | 8 | 100 | 100 | 40.55 | 38.06 | **6.53%** | | Whisper v3 large | 1 | / | 62 | 20.05 | 18.90 | **6.10%** | | Whisper v3 large | 8 | / | 77 | 25.42 | 24.77 | **2.59%** | | Whisper v3 large | 16 | / | 77 | 28.51 | 26.32 | **8.34%** | - F.scaled_dot_product_attention support by [@fxmarty](https://togithub.com/fxmarty) in [#26572](https://togithub.com/huggingface/transformers/issues/26572) #### New Cache abstraction & Attention Sinks support We are rolling out a new abstraction for the `past_key_values` cache, which enables the use of different types of caches. For now, only `llama` and `llama`-inspired architectures (`mistral`, `persimmon`, `phi`) support it, with other architectures scheduled to have support in the next release. By default, a growing cache (`DynamicCache`) is used, which preserves the existing behavior. This release also includes a new `SinkCache` cache, which implements the [Attention Sinks paper](https://arxiv.org/abs/2309.17453). With `SinkCache`, the model is able to continue generating high-quality text well beyond its training sequence length! Note that it does not expand the context window, so it can’t digest very long inputs — it is suited for streaming applications such as multi-round dialogues. Check this [colab](https://colab.research.google.com/drive/1S0oIPaqxAVp0oWEwTadhZXDjhWiTyF12?usp=sharing) for an example. ![image](https://togithub.com/huggingface/transformers/assets/12240844/c6fd4077-b884-4d9c-be55-324474a1cc76) - Generate: New `Cache` abstraction and Attention Sinks support by [@tomaarsen](https://togithub.com/tomaarsen) in [#26681](https://togithub.com/huggingface/transformers/issues/26681) - Generate: SinkCache can handle iterative prompts by [@gante](https://togithub.com/gante) in [#27907](https://togithub.com/huggingface/transformers/issues/27907) #### Safetensors as a default We continue toggling features enabling safetensors as a default across the board, in PyTorch, Flax, and TensorFlow. When using PyTorch model and forcing the load of `safetensors` file with `use_safetensors=True`, if the repository does not contain the safetensors files, they will now be converted on-the-fly server-side. - Default to msgpack for safetensors by [@LysandreJik](https://togithub.com/LysandreJik) in [#27460](https://togithub.com/huggingface/transformers/issues/27460) - Fix `from_pt` flag when loading with `safetensors` by [@LysandreJik](https://togithub.com/LysandreJik) in [#27394](https://togithub.com/huggingface/transformers/issues/27394) - Make using safetensors files automated. by [@Narsil](https://togithub.com/Narsil) in [#27571](https://togithub.com/huggingface/transformers/issues/27571) #### Breaking changes ##### pickle files We now disallow the use of `pickle.load` internally for security purposes. To circumvent this, you can use the `TRUST_REMOTE_CODE=True` command to indicate that you would still like to load it. - 🚨🚨🚨 Disallow `pickle.load` unless `TRUST_REMOTE_CODE=True` by [@ydshieh](https://togithub.com/ydshieh) in [#27776](https://togithub.com/huggingface/transformers/issues/27776) ##### Beam score calculation for decoder-only models In the previous implementation of beam search, when `length_penalty` is active, the beam score for decoder-only models was penalized by the total length of both prompt and generated sequence. However, the length of prompt should not be included in the penalization step -- this release fixes it. - 🚨🚨 Fix beam score calculation issue for decoder-only models by [@VsonicV](https://togithub.com/VsonicV) in [#27351](https://togithub.com/huggingface/transformers/issues/27351) ##### Slight API changes/corrections - ⚠️ \[VitDet] Fix test by [@NielsRogge](https://togithub.com/NielsRogge) in [#27832](https://togithub.com/huggingface/transformers/issues/27832) - \[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` by [@fxmarty](https://togithub.com/fxmarty) in [#27868](https://togithub.com/huggingface/transformers/issues/27868) #### Bugfixes and improvements - Enrich TTS pipeline parameters naming by [@ylacombe](https://togithub.com/ylacombe) in [#26473](https://togithub.com/huggingface/transformers/issues/26473) - translate peft.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27215](https://togithub.com/huggingface/transformers/issues/27215) - Removed the redundant SiLUActivation class. by [@hi-sushanta](https://togithub.com/hi-sushanta) in [#27136](https://togithub.com/huggingface/transformers/issues/27136) - Fixed base model class name extraction from PeftModels by [@kkteru](https://togithub.com/kkteru) in [#27162](https://togithub.com/huggingface/transformers/issues/27162) - Fuyu protection by [@LysandreJik](https://togithub.com/LysandreJik) in [#27248](https://togithub.com/huggingface/transformers/issues/27248) - Refactor: Use Llama RoPE implementation for Falcon by [@tomaarsen](https://togithub.com/tomaarsen) in [#26933](https://togithub.com/huggingface/transformers/issues/26933) - \[`PEFT` / `Tests` ] Fix peft integration failing tests by [@younesbelkada](https://togithub.com/younesbelkada) in [#27258](https://togithub.com/huggingface/transformers/issues/27258) - Avoid many failing tests in doctesting by [@ydshieh](https://togithub.com/ydshieh) in [#27262](https://togithub.com/huggingface/transformers/issues/27262) - \[docs] Custom model doc update by [@MKhalusova](https://togithub.com/MKhalusova) in [#27213](https://togithub.com/huggingface/transformers/issues/27213) - Update the ConversationalPipeline docstring for chat templates by [@Rocketknight1](https://togithub.com/Rocketknight1) in [#27250](https://togithub.com/huggingface/transformers/issues/27250) - Fix switch transformer mixed precision issue by [@timlee0212](https://togithub.com/timlee0212) in [#27220](https://togithub.com/huggingface/transformers/issues/27220) - \[`Docs` / `SAM` ] Reflect correct changes to run inference without OOM by [@younesbelkada](https://togithub.com/younesbelkada) in [#27268](https://togithub.com/huggingface/transformers/issues/27268) - \[Docs] Model_doc structure/clarity improvements by [@MKhalusova](https://togithub.com/MKhalusova) in [#26876](https://togithub.com/huggingface/transformers/issues/26876) - \[`FA2`] Add flash attention for for `DistilBert` by [@susnato](https://togithub.com/susnato) in [#26489](https://togithub.com/huggingface/transformers/issues/26489) - translate autoclass_tutorial to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27269](https://togithub.com/huggingface/transformers/issues/27269) - translate run_scripts.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27246](https://togithub.com/huggingface/transformers/issues/27246) - Fix tokenizer export for LLamaTokenizerFast by [@mayank31398](https://togithub.com/mayank31398) in [#27222](https://togithub.com/huggingface/transformers/issues/27222) - Fix daily CI image build by [@ydshieh](https://togithub.com/ydshieh) in [#27307](https://togithub.com/huggingface/transformers/issues/27307) - Update doctest workflow file by [@ydshieh](https://togithub.com/ydshieh) in [#27306](https://togithub.com/huggingface/transformers/issues/27306) - Remove an unexpected argument for FlaxResNetBasicLayerCollection by [@pingzhili](https://togithub.com/pingzhili) in [#27272](https://togithub.com/huggingface/transformers/issues/27272) - enable memory tracker metrics for npu by [@statelesshz](https://togithub.com/statelesshz) in [#27280](https://togithub.com/huggingface/transformers/issues/27280) - \[`PretrainedTokenizer`] add some of the most important functions to the doc by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27313](https://togithub.com/huggingface/transformers/issues/27313) - Update sequence_classification.md by [@akshayvkt](https://togithub.com/akshayvkt) in [#27281](https://togithub.com/huggingface/transformers/issues/27281) - Fix VideoMAEforPretrained dtype error by [@ikergarcia1996](https://togithub.com/ikergarcia1996) in [#27296](https://togithub.com/huggingface/transformers/issues/27296) - Fix `Kosmos2Processor` batch mode by [@ydshieh](https://togithub.com/ydshieh) in [#27323](https://togithub.com/huggingface/transformers/issues/27323) - \[docs] fixed links with 404 by [@MKhalusova](https://togithub.com/MKhalusova) in [#27327](https://togithub.com/huggingface/transformers/issues/27327) - \[Whisper] Block language/task args for English-only by [@sanchit-gandhi](https://togithub.com/sanchit-gandhi) in [#27322](https://togithub.com/huggingface/transformers/issues/27322) - Fix autoawq docker image by [@younesbelkada](https://togithub.com/younesbelkada) in [#27339](https://togithub.com/huggingface/transformers/issues/27339) - Generate: skip tests on unsupported models instead of passing by [@gante](https://togithub.com/gante) in [#27265](https://togithub.com/huggingface/transformers/issues/27265) - Fix Whisper Conversion Script: Correct decoder_attention_heads and \_download function by [@zuazo](https://togithub.com/zuazo) in [#26834](https://togithub.com/huggingface/transformers/issues/26834) - \[`FA2`] Add flash attention for `GPT-Neo` by [@susnato](https://togithub.com/susnato) in [#26486](https://togithub.com/huggingface/transformers/issues/26486) - \[`Whisper`] Add conversion script for the tokenizer by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27338](https://togithub.com/huggingface/transformers/issues/27338) - Remove a redundant variable. by [@hi-sushanta](https://togithub.com/hi-sushanta) in [#27288](https://togithub.com/huggingface/transformers/issues/27288) - Resolve AttributeError by utilizing device calculation at the start of the forward function by [@folbaeni](https://togithub.com/folbaeni) in [#27347](https://togithub.com/huggingface/transformers/issues/27347) - Remove padding_masks from `gpt_bigcode`. by [@susnato](https://togithub.com/susnato) in [#27348](https://togithub.com/huggingface/transformers/issues/27348) - \[`Whisper`] Nit converting the tokenizer by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27349](https://togithub.com/huggingface/transformers/issues/27349) - FIx Bark batching feature by [@ylacombe](https://togithub.com/ylacombe) in [#27271](https://togithub.com/huggingface/transformers/issues/27271) - Allow scheduler parameters by [@Plemeur](https://togithub.com/Plemeur) in [#26480](https://togithub.com/huggingface/transformers/issues/26480) - translate the en tokenizer_summary.md to Chinese by [@ZouJiu1](https://togithub.com/ZouJiu1) in [#27291](https://togithub.com/huggingface/transformers/issues/27291) - translate model_sharing.md and llm_tutorial.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27283](https://togithub.com/huggingface/transformers/issues/27283) - Add numpy alternative to FE using torchaudio by [@ylacombe](https://togithub.com/ylacombe) in [#26339](https://togithub.com/huggingface/transformers/issues/26339) - moving example of benchmarking to legacy dir by [@statelesshz](https://togithub.com/statelesshz) in [#27337](https://togithub.com/huggingface/transformers/issues/27337) - Fix example tests from failing by [@muellerzr](https://togithub.com/muellerzr) in [#27353](https://togithub.com/huggingface/transformers/issues/27353) - Fix `Kosmos-2` device issue by [@ydshieh](https://togithub.com/ydshieh) in [#27346](https://togithub.com/huggingface/transformers/issues/27346) - MusicGen Update by [@sanchit-gandhi](https://togithub.com/sanchit-gandhi) in [#27084](https://togithub.com/huggingface/transformers/issues/27084) - Translate index.md to Turkish by [@mertyyanik](https://togithub.com/mertyyanik) in [#27093](https://togithub.com/huggingface/transformers/issues/27093) - Remove unused param from example script tests by [@muellerzr](https://togithub.com/muellerzr) in [#27354](https://togithub.com/huggingface/transformers/issues/27354) - \[Flax Whisper] large-v3 compatibility by [@sanchit-gandhi](https://togithub.com/sanchit-gandhi) in [#27360](https://togithub.com/huggingface/transformers/issues/27360) - Fix tiny model script: not using `from_pt=True` by [@ydshieh](https://togithub.com/ydshieh) in [#27372](https://togithub.com/huggingface/transformers/issues/27372) - translate big_models.md and performance.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27334](https://togithub.com/huggingface/transformers/issues/27334) - Add Flash Attention 2 support to Bark by [@ylacombe](https://togithub.com/ylacombe) in [#27364](https://togithub.com/huggingface/transformers/issues/27364) - Update deprecated `torch.range` in `test_modeling_ibert.py` by [@kit1980](https://togithub.com/kit1980) in [#27355](https://togithub.com/huggingface/transformers/issues/27355) - translate debugging.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27374](https://togithub.com/huggingface/transformers/issues/27374) - Smangrul/fix failing ds ci tests by [@pacman100](https://togithub.com/pacman100) in [#27358](https://togithub.com/huggingface/transformers/issues/27358) - \[`CodeLlamaTokenizer`] Nit, update **init** to make sure the AddedTokens are not normalized because they are special by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27359](https://togithub.com/huggingface/transformers/issues/27359) - Change thresh in test by [@muellerzr](https://togithub.com/muellerzr) in [#27378](https://togithub.com/huggingface/transformers/issues/27378) - Put doctest options back to `pyproject.toml` by [@ydshieh](https://togithub.com/ydshieh) in [#27366](https://togithub.com/huggingface/transformers/issues/27366) - Skip failing cache call tests by [@amyeroberts](https://togithub.com/amyeroberts) in [#27393](https://togithub.com/huggingface/transformers/issues/27393) - device-agnostic deepspeed testing by [@statelesshz](https://togithub.com/statelesshz) in [#27342](https://togithub.com/huggingface/transformers/issues/27342) - Adds dvclive callback by [@dberenbaum](https://togithub.com/dberenbaum) in [#27352](https://togithub.com/huggingface/transformers/issues/27352) - use `pytest.mark` directly by [@ydshieh](https://togithub.com/ydshieh) in [#27390](https://togithub.com/huggingface/transformers/issues/27390) - Fix fuyu checkpoint repo in `FuyuConfig` by [@ydshieh](https://togithub.com/ydshieh) in [#27399](https://togithub.com/huggingface/transformers/issues/27399) - Use editable install for git deps by [@muellerzr](https://togithub.com/muellerzr) in [#27404](https://togithub.com/huggingface/transformers/issues/27404) - Final fix of the accelerate installation issue by [@ydshieh](https://togithub.com/ydshieh) in [#27408](https://togithub.com/huggingface/transformers/issues/27408) - Fix RequestCounter to make it more future-proof by [@Wauplin](https://togithub.com/Wauplin) in [#27406](https://togithub.com/huggingface/transformers/issues/27406) - remove failing tests and clean FE files by [@ylacombe](https://togithub.com/ylacombe) in [#27414](https://togithub.com/huggingface/transformers/issues/27414) - Fix `Owlv2` checkpoint name and a default value in `Owlv2VisionConfig` by [@ydshieh](https://togithub.com/ydshieh) in [#27402](https://togithub.com/huggingface/transformers/issues/27402) - Run all tests if `circleci/create_circleci_config.py` is modified by [@ydshieh](https://togithub.com/ydshieh) in [#27413](https://togithub.com/huggingface/transformers/issues/27413) - add attention_mask and position_ids in assisted model by [@jiqing-feng](https://togithub.com/jiqing-feng) in [#26892](https://togithub.com/huggingface/transformers/issues/26892) - \[`Quantization`] Add str to enum conversion for AWQ by [@younesbelkada](https://togithub.com/younesbelkada) in [#27320](https://togithub.com/huggingface/transformers/issues/27320) - update Bark FA2 docs by [@ylacombe](https://togithub.com/ylacombe) in [#27400](https://togithub.com/huggingface/transformers/issues/27400) - \[`AttentionMaskConverter`] ]Fix-mask-inf by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27114](https://togithub.com/huggingface/transformers/issues/27114) - At most 2 GPUs for CI by [@ydshieh](https://togithub.com/ydshieh) in [#27435](https://togithub.com/huggingface/transformers/issues/27435) - Normalize floating point cast by [@amyeroberts](https://togithub.com/amyeroberts) in [#27249](https://togithub.com/huggingface/transformers/issues/27249) - Make `examples_torch_job` faster by [@ydshieh](https://togithub.com/ydshieh) in [#27437](https://togithub.com/huggingface/transformers/issues/27437) - Fix line ending in `utils/not_doctested.txt` by [@ydshieh](https://togithub.com/ydshieh) in [#27459](https://togithub.com/huggingface/transformers/issues/27459) - Fix some Wav2Vec2 related models' doctest by [@ydshieh](https://togithub.com/ydshieh) in [#27462](https://togithub.com/huggingface/transformers/issues/27462) - Fixed typo in error message by [@cmcmaster1](https://togithub.com/cmcmaster1) in [#27461](https://togithub.com/huggingface/transformers/issues/27461) - Remove-auth-token by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27060](https://togithub.com/huggingface/transformers/issues/27060) - \[`Llama + Mistral`] Add attention dropout by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27315](https://togithub.com/huggingface/transformers/issues/27315) - OWLv2: bug fix in post_process_object_detection() when using cuda device by [@assafbot](https://togithub.com/assafbot) in [#27468](https://togithub.com/huggingface/transformers/issues/27468) - Fix docstring for `gradient_checkpointing_kwargs` by [@tomaszcichy98](https://togithub.com/tomaszcichy98) in [#27470](https://togithub.com/huggingface/transformers/issues/27470) - Install `python-Levenshtein` for `nougat` in CI image by [@ydshieh](https://togithub.com/ydshieh) in [#27465](https://togithub.com/huggingface/transformers/issues/27465) - Add version check for Jinja by [@Rocketknight1](https://togithub.com/Rocketknight1) in [#27403](https://togithub.com/huggingface/transformers/issues/27403) - Fix Falcon tokenizer loading in pipeline by [@Rocketknight1](https://togithub.com/Rocketknight1) in [#27316](https://togithub.com/huggingface/transformers/issues/27316) - \[`AWQ` ] Addresses TODO for awq tests by [@younesbelkada](https://togithub.com/younesbelkada) in [#27467](https://togithub.com/huggingface/transformers/issues/27467) - Perf torch compile by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27422](https://togithub.com/huggingface/transformers/issues/27422) - Fixed typo in pipelines.md documentation by [@adismort14](https://togithub.com/adismort14) in [#27455](https://togithub.com/huggingface/transformers/issues/27455) - Fix FA2 import + deprecation cycle by [@SunMarc](https://togithub.com/SunMarc) in [#27330](https://togithub.com/huggingface/transformers/issues/27330) - \[`Peft`] `modules_to_save` support for peft integration by [@younesbelkada](https://togithub.com/younesbelkada) in [#27466](https://togithub.com/huggingface/transformers/issues/27466) - \[`CI-test_torch`] skip `test_tf_from_pt_safetensors` for 4 models by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27481](https://togithub.com/huggingface/transformers/issues/27481) - Fix M4T weights tying by [@ylacombe](https://togithub.com/ylacombe) in [#27395](https://togithub.com/huggingface/transformers/issues/27395) - Add speecht5 batch generation and fix wrong attention mask when padding by [@Spycsh](https://togithub.com/Spycsh) in [#25943](https://togithub.com/huggingface/transformers/issues/25943) - Clap processor: remove wasteful np.stack operations by [@m-bain](https://togithub.com/m-bain) in [#27454](https://togithub.com/huggingface/transformers/issues/27454) - \[Whisper] Fix pipeline test by [@sanchit-gandhi](https://togithub.com/sanchit-gandhi) in [#27442](https://togithub.com/huggingface/transformers/issues/27442) - Revert "\[time series] Add PatchTST by [@amyeroberts](https://togithub.com/amyeroberts) in [#25927](https://togithub.com/huggingface/transformers/issues/25927))" - translate hpo_train.md and perf_hardware.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27431](https://togithub.com/huggingface/transformers/issues/27431) - Generate: fix `ExponentialDecayLengthPenalty` doctest by [@gante](https://togithub.com/gante) in [#27485](https://togithub.com/huggingface/transformers/issues/27485) - Update and reorder docs for chat templates by [@Rocketknight1](https://togithub.com/Rocketknight1) in [#27443](https://togithub.com/huggingface/transformers/issues/27443) - Generate: `GenerationConfig.from_pretrained` can return unused kwargs by [@gante](https://togithub.com/gante) in [#27488](https://togithub.com/huggingface/transformers/issues/27488) - Minor type annotation fix by [@vwxyzjn](https://togithub.com/vwxyzjn) in [#27276](https://togithub.com/huggingface/transformers/issues/27276) - Have seq2seq just use gather by [@muellerzr](https://togithub.com/muellerzr) in [#27025](https://togithub.com/huggingface/transformers/issues/27025) - Update processor mapping for hub snippets by [@amyeroberts](https://togithub.com/amyeroberts) in [#27477](https://togithub.com/huggingface/transformers/issues/27477) - Track the number of tokens seen to metrics by [@muellerzr](https://togithub.com/muellerzr) in [#27274](https://togithub.com/huggingface/transformers/issues/27274) - \[`CI-test_torch`] skip test_tf_from_pt_safetensors and `test_assisted_decoding_sample` by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27508](https://togithub.com/huggingface/transformers/issues/27508) - \[Fuyu] Add tests by [@NielsRogge](https://togithub.com/NielsRogge) in [#27001](https://togithub.com/huggingface/transformers/issues/27001) - \[Table Transformer] Add Transformers-native checkpoints by [@NielsRogge](https://togithub.com/NielsRogge) in [#26928](https://togithub.com/huggingface/transformers/issues/26928) - Update spelling mistake by [@LimJing7](https://togithub.com/LimJing7) in [#27506](https://togithub.com/huggingface/transformers/issues/27506) - \[`CircleCI`] skip test_assisted_decoding_sample for everyone by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27511](https://togithub.com/huggingface/transformers/issues/27511) - Make some jobs run on the GitHub Actions runners by [@ydshieh](https://togithub.com/ydshieh) in [#27512](https://togithub.com/huggingface/transformers/issues/27512) - \[`tokenizers`] update `tokenizers` version pin by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27494](https://togithub.com/huggingface/transformers/issues/27494) - \[ `PretrainedConfig`] Improve messaging by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27438](https://togithub.com/huggingface/transformers/issues/27438) - Fix wav2vec2 params by [@muellerzr](https://togithub.com/muellerzr) in [#27515](https://togithub.com/huggingface/transformers/issues/27515) - Translating `en/model_doc` docs to Japanese. by [@Yuki-Imajuku](https://togithub.com/Yuki-Imajuku) in [#27401](https://togithub.com/huggingface/transformers/issues/27401) - Fixing the failure of models without max_position_embeddings attribute. by [@AdamLouly](https://togithub.com/AdamLouly) in [#27499](https://togithub.com/huggingface/transformers/issues/27499) - Incorrect setting for num_beams in translation and summarization examples by [@Rocketknight1](https://togithub.com/Rocketknight1) in [#27519](https://togithub.com/huggingface/transformers/issues/27519) - Fix bug for T5x to PyTorch convert script with varying encoder and decoder layers by [@JamesJiang97](https://togithub.com/JamesJiang97) in [#27448](https://togithub.com/huggingface/transformers/issues/27448) - Fix offload disk for loading derivated model checkpoint into base model by [@SunMarc](https://togithub.com/SunMarc) in [#27253](https://togithub.com/huggingface/transformers/issues/27253) - translate model.md to chinese by [@statelesshz](https://togithub.com/statelesshz) in [#27518](https://togithub.com/huggingface/transformers/issues/27518) - Support ONNX export for causal LM sequence classifiers by [@dwyatte](https://togithub.com/dwyatte) in [#27450](https://togithub.com/huggingface/transformers/issues/27450) - \[`pytest`] Avoid flash attn test marker warning by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27509](https://togithub.com/huggingface/transformers/issues/27509) - docs: add docs for map, and add num procs to load_dataset by [@pphuc25](https://togithub.com/pphuc25) in [#27520](https://togithub.com/huggingface/transformers/issues/27520) - Update the TF pin for 2.15 by [@Rocketknight1](https://togithub.com/Rocketknight1) in [#27375](https://togithub.com/huggingface/transformers/issues/27375) - Revert "add attention_mask and position_ids in assisted model" by [@patrickvonplaten](https://togithub.com/patrickvonplaten) in [#27523](https://togithub.com/huggingface/transformers/issues/27523) - Set `usedforsecurity=False` in hashlib methods (FIPS compliance) by [@Wauplin](https://togithub.com/Wauplin) in [#27483](https://togithub.com/huggingface/transformers/issues/27483) - Raise error when quantizing a quantized model by [@SunMarc](https://togithub.com/SunMarc) in [#27500](https://togithub.com/huggingface/transformers/issues/27500) - Disable docker image build job `latest-pytorch-amd` for now by [@ydshieh](https://togithub.com/ydshieh) in [#27541](https://togithub.com/huggingface/transformers/issues/27541) - \[`Styling`] stylify using ruff by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27144](https://togithub.com/huggingface/transformers/issues/27144) - Generate: improve assisted generation tests by [@gante](https://togithub.com/gante) in [#27540](https://togithub.com/huggingface/transformers/issues/27540) - Updated albert.md doc for ALBERT model by [@ENate](https://togithub.com/ENate) in [#27223](https://togithub.com/huggingface/transformers/issues/27223) - translate Trainer.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27527](https://togithub.com/huggingface/transformers/issues/27527) - Skip some fuyu tests by [@ydshieh](https://togithub.com/ydshieh) in [#27553](https://togithub.com/huggingface/transformers/issues/27553) - Fix AMD CI not showing GPU by [@ydshieh](https://togithub.com/ydshieh) in [#27555](https://togithub.com/huggingface/transformers/issues/27555) - Generate: fix flaky tests by [@gante](https://togithub.com/gante) in [#27543](https://togithub.com/huggingface/transformers/issues/27543) - Generate: update compute transition scores doctest by [@gante](https://togithub.com/gante) in [#27558](https://togithub.com/huggingface/transformers/issues/27558) - fixed broken link by [@VpkPrasanna](https://togithub.com/VpkPrasanna) in [#27560](https://togithub.com/huggingface/transformers/issues/27560) - Broken links fixed related to datasets docs by [@VpkPrasanna](https://togithub.com/VpkPrasanna) in [#27569](https://togithub.com/huggingface/transformers/issues/27569) - translate deepspeed.md to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27495](https://togithub.com/huggingface/transformers/issues/27495) - Fix broken distilbert url by [@osanseviero](https://togithub.com/osanseviero) in [#27579](https://togithub.com/huggingface/transformers/issues/27579) - Adding leaky relu in dict ACT2CLS by [@rafaelpadilla](https://togithub.com/rafaelpadilla) in [#27574](https://togithub.com/huggingface/transformers/issues/27574) - Fix idx2sym not loaded from pretrained vocab file in Transformer XL by [@jtang98](https://togithub.com/jtang98) in [#27589](https://togithub.com/huggingface/transformers/issues/27589) - Add `convert_hf_to_openai.py` script to Whisper documentation resources by [@zuazo](https://togithub.com/zuazo) in [#27590](https://togithub.com/huggingface/transformers/issues/27590) - docs: fix 404 link by [@panpan0000](https://togithub.com/panpan0000) in [#27529](https://togithub.com/huggingface/transformers/issues/27529) - \[ examples] fix loading jsonl with load dataset in run translation example by [@mathiasesn](https://togithub.com/mathiasesn) in [#26924](https://togithub.com/huggingface/transformers/issues/26924) - \[`FA-2`] Add fa2 support for `from_config` by [@younesbelkada](https://togithub.com/younesbelkada) in [#26914](https://togithub.com/huggingface/transformers/issues/26914) - timm to pytorch conversion for vit model fix by [@staghado](https://togithub.com/staghado) in [#26908](https://togithub.com/huggingface/transformers/issues/26908) - \[Whisper] Add `large-v3` version support by [@flyingleafe](https://togithub.com/flyingleafe) in [#27336](https://togithub.com/huggingface/transformers/issues/27336) - Update Korean tutorial for using LLMs, and refactor the nested conditional statements in hr_argparser.py by [@YeonwooSung](https://togithub.com/YeonwooSung) in [#27489](https://togithub.com/huggingface/transformers/issues/27489) - Fix torch.fx import issue for torch 1.12 by [@amyeroberts](https://togithub.com/amyeroberts) in [#27570](https://togithub.com/huggingface/transformers/issues/27570) - dvclive callback: warn instead of fail when logging non-scalars by [@dberenbaum](https://togithub.com/dberenbaum) in [#27608](https://togithub.com/huggingface/transformers/issues/27608) - \[`core` / `gradient_checkpointing`] add support for old GC method by [@younesbelkada](https://togithub.com/younesbelkada) in [#27610](https://togithub.com/huggingface/transformers/issues/27610) - \[ConvNext] Improve backbone by [@NielsRogge](https://togithub.com/NielsRogge) in [#27621](https://togithub.com/huggingface/transformers/issues/27621) - Generate: Update docs regarding reusing `past_key_values` in `generate` by [@gante](https://togithub.com/gante) in [#27612](https://togithub.com/huggingface/transformers/issues/27612) - Idefics: Fix information leak with cross attention gate in modeling by [@leot13](https://togithub.com/leot13) in [#26839](https://togithub.com/huggingface/transformers/issues/26839) - Fix flash attention bugs with Mistral and Falcon by [@fxmarty](https://togithub.com/fxmarty) in [#27625](https://togithub.com/huggingface/transformers/issues/27625) - Fix tracing dinov2 by [@amyeroberts](https://togithub.com/amyeroberts) in [#27561](https://togithub.com/huggingface/transformers/issues/27561) - remove the deprecated method `init_git_repo` by [@statelesshz](https://togithub.com/statelesshz) in [#27617](https://togithub.com/huggingface/transformers/issues/27617) - Explicitely specify `use_cache=True` in Flash Attention tests by [@fxmarty](https://togithub.com/fxmarty) in [#27635](https://togithub.com/huggingface/transformers/issues/27635) - Harmonize HF environment variables + other cleaning by [@Wauplin](https://togithub.com/Wauplin) in [#27564](https://togithub.com/huggingface/transformers/issues/27564) - Fix `resize_token_embeddings` by [@czy-orange](https://togithub.com/czy-orange) in [#26861](https://togithub.com/huggingface/transformers/issues/26861)) - \[`dependency`] update pillow pins by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27409](https://togithub.com/huggingface/transformers/issues/27409) - Simplify the implementation of jitter noise in moe models by [@jiangwangyi](https://togithub.com/jiangwangyi) in [#27643](https://togithub.com/huggingface/transformers/issues/27643) - Fix `max_steps` documentation regarding the end-of-training condition by [@qgallouedec](https://togithub.com/qgallouedec) in [#27624](https://togithub.com/huggingface/transformers/issues/27624) - \[Whisper] Add sequential longform decoding by [@patrickvonplaten](https://togithub.com/patrickvonplaten) in [#27492](https://togithub.com/huggingface/transformers/issues/27492) - Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration by [@dg845](https://togithub.com/dg845) in [#24799](https://togithub.com/huggingface/transformers/issues/24799) - update Openai API call method by [@Strive-for-excellence](https://togithub.com/Strive-for-excellence) in [#27628](https://togithub.com/huggingface/transformers/issues/27628) - update d_kv'annotation in mt5'configuration by [@callanwu](https://togithub.com/callanwu) in [#27585](https://togithub.com/huggingface/transformers/issues/27585) - \[`FA2`] Add flash attention for opt by [@susnato](https://togithub.com/susnato) in [#26414](https://togithub.com/huggingface/transformers/issues/26414) - Extended semantic segmentation to image segmentation by [@merveenoyan](https://togithub.com/merveenoyan) in [#27039](https://togithub.com/huggingface/transformers/issues/27039) - Update TVP arxiv link by [@amyeroberts](https://togithub.com/amyeroberts) in [#27672](https://togithub.com/huggingface/transformers/issues/27672) - \[DPT, Dinov2] Add resources by [@NielsRogge](https://togithub.com/NielsRogge) in [#27655](https://togithub.com/huggingface/transformers/issues/27655) - Update tiny model summary file by [@ydshieh](https://togithub.com/ydshieh) in [#27388](https://togithub.com/huggingface/transformers/issues/27388) - Refactoring Trainer, adds `save_only_model` arg and simplifying FSDP integration by [@pacman100](https://togithub.com/pacman100) in [#27652](https://togithub.com/huggingface/transformers/issues/27652) - Skip pipeline tests for 2 models for now by [@ydshieh](https://togithub.com/ydshieh) in [#27687](https://togithub.com/huggingface/transformers/issues/27687) - Deprecate `TransfoXL` by [@ydshieh](https://togithub.com/ydshieh) in [#27607](https://togithub.com/huggingface/transformers/issues/27607) - Fix typo in warning message by [@liuxueyang](https://togithub.com/liuxueyang) in [#27055](https://togithub.com/huggingface/transformers/issues/27055) - Docs/Add conversion code to the musicgen docs by [@yoinked-h](https://togithub.com/yoinked-h) in [#27665](https://togithub.com/huggingface/transformers/issues/27665) - Fix semantic error in evaluation section by [@anihm136](https://togithub.com/anihm136) in [#27675](https://togithub.com/huggingface/transformers/issues/27675) - \[`DocString`] Support a revision in the docstring `add_code_sample_docstrings` to facilitate integrations by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27645](https://togithub.com/huggingface/transformers/issues/27645) - Successfully Resolved The ZeroDivisionError Exception. by [@hi-sushanta](https://togithub.com/hi-sushanta) in [#27524](https://togithub.com/huggingface/transformers/issues/27524) - Fix `TVPModelTest` by [@ydshieh](https://togithub.com/ydshieh) in [#27695](https://togithub.com/huggingface/transformers/issues/27695) - Fix sliding_window hasattr in Mistral by [@IlyaGusev](https://togithub.com/IlyaGusev) in [#27041](https://togithub.com/huggingface/transformers/issues/27041) - Fix Past CI by [@ydshieh](https://togithub.com/ydshieh) in [#27696](https://togithub.com/huggingface/transformers/issues/27696) - fix warning by [@ArthurZucker](https://togithub.com/ArthurZucker) in [#27689](https://togithub.com/huggingface/transformers/issues/27689) - Reorder the code on the Hub to explicit that sharing on the Hub isn't a requirement by [@LysandreJik](https://togithub.com/LysandreJik) in [#27691](https://togithub.com/huggingface/transformers/issues/27691) - Fix mistral generate for long prompt / response by [@lorabit110](https://togithub.com/lorabit110) in [#27548](https://togithub.com/huggingface/transformers/issues/27548) - Fix oneformer instance segmentation RuntimeError by [@yhshin11](https://togithub.com/yhshin11) in [#27725](https://togithub.com/huggingface/transformers/issues/27725) - fix assisted decoding assistant model inputs by [@jiqing-feng](https://togithub.com/jiqing-feng) in [#27503](https://togithub.com/huggingface/transformers/issues/27503) - Update forward signature test for vision models by [@NielsRogge](https://togithub.com/NielsRogge) in [#27681](https://togithub.com/huggingface/transformers/issues/27681) - Modify group_sub_entities in TokenClassification Pipeline to support label with "-" by [@eshoyuan](https://togithub.com/eshoyuan) in [#27325](https://togithub.com/huggingface/transformers/issues/27325) - Fix owlv2 code snippet by [@NielsRogge](https://togithub.com/NielsRogge) in [#27698](https://togithub.com/huggingface/transformers/issues/27698) - docs: replace torch.distributed.run by torchrun by [@panpan0000](https://togithub.com/panpan0000) in [#27528](https://togithub.com/huggingface/transformers/issues/27528) - Update chat template warnings/guides by [@Rocketknight1](https://togithub.com/Rocketknight1) in [#27634](https://togithub.com/huggingface/transformers/issues/27634) - translation main-class files to chinese by [@jiaqiw09](https://togithub.com/jiaqiw09) in [#27588](https://togithub.com/huggingface/transformers/issues/27588) - Translate `en/model_doc` to JP by [@rajveer43](https://togithub.com/rajveer43) in [#27264](https://togithub.com/huggingface/transformers/issues/27264) - Fixed passing scheduler-specific kwargs via TrainingArguments lr_scheduler_kwargs by [@CharbelAD](https://togithub.com/CharbelAD) in [#27595](https://togithub.com/huggingface/transformers/issues/27595) - Fix AMD Push CI not triggered by [@ydshieh](https://togithub.com/ydshieh) in [#27732](https://togithub.com/huggingface/transformers/issues/27732) - Add BeitBackbone by [@NielsRogge](https://togithub.com/NielsRogge) in [#25952](https://togithub.com/huggingface/transformers/issues/25952) - Update tiny model creation script by [@ydshieh](https://togithub.com/ydshieh) in [#27674](https://togithub.com/huggingface/transformers/issues/27674) - Log a warning in `TransfoXLTokenizer.__init__` by [@ydshieh](https://togithub.com/ydshieh) in [#27721](https://togithub.com/huggingface/transformers/issues/27721) - Add madlad-400 MT models by [@jbochi](https://togithub.com/jbochi) in [#27471](https://togithub.com/huggingface/transformers/issues/27471) - Enforce pin memory disabling when using cpu only by [@qgallouedec](https://togithub.com/qgallouedec) in [#27745](https://togithub.com/huggingface/transformers/issues/27745) - Trigger corresponding pipeline tests if `tests/utils/tiny_model_summary.json` is modified by [@ydshieh](https://togithub.com/ydshieh) in [#27693](https://togithub.com/huggingface/transformers/issues/27693) - CLVP Fixes by [@susnato](https://togithub.com/susnato) in [#27547](https://togithub.com/huggingface/transformers/issues/27547) - Docs: Fix broken cross-references, i.e. `~transformer.` -> `~transformers.` by [@tomaarsen](https://togithub.com/tomaarsen) in [#27740](https://togithub.com/huggingface/transformers/issues/27740) - \[docs] Quantization by [@stevhliu](https://togithub.com/stevhliu) in [#27641](https://togithub.com/huggingface/transformers/issues/27641) - Fix precision errors from casting rotary parameters to FP16 with AMP by [@kevinhu](https://togithub.com/kevinhu) in [#27700](https://togithub.com/huggingface/transformers/issues/27700) - Remove `check_runner_status.yml` by [@ydshieh](https://togithub.com/ydshieh) in [#27767](https://togithub.com/huggingface/transformers/issues/27767) - uses dvclive_test mode in examples/pytorch/test_accelerate_examples.py by [@dberenbaum](https://togithub.com/dberenbaum) in [#27763](https://togithub.com/huggingface/transformers/issues/27763) - Gene

Configuration

📅 Schedule: Branch creation - "" in timezone Europe/Zurich, Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled because a matching PR was automerged previously.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

[ ] If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

AML14 / tratto