Bump transformers from 4.36.0 to 4.41.1

Bumps transformers from 4.36.0 to 4.41.1.

Release notes

Release v4.41.1 Fix PaliGemma finetuning, and some small bugs

Release v4.41.1

Fix PaliGemma finetuning:

The causal mask and label creation was causing label leaks when training. Kudos to @probicheaux for finding and reporting!

https://github.com/huggingface/transformers/commit/a755745546779ae5c42510bc02a859bdac82b3b7 : PaliGemma - fix processor with no input text (huggingface/transformers#30916) @hiyouga

https://github.com/huggingface/transformers/commit/a25f7d3c12975fe21eab437dda7363e9024de7c0 : Paligemma causal attention mask (huggingface/transformers#30967) @molbap and @probicheaux

Other fixes:

https://github.com/huggingface/transformers/commit/bb48e921868ac750417956de941606f7e2fa02ca: tokenizer_class = "AutoTokenizer" Llava Family (huggingface/transformers#30912)

https://github.com/huggingface/transformers/commit/1d568dfab262f76079eb4f3d05b606d51a0c9e4b : legacy to init the slow tokenizer when converting from slow was wrong (huggingface/transformers#30972)

https://github.com/huggingface/transformers/commit/b1065aa08ac0da11fcb9e3827cd7eafabe4beebd : Generation: get special tokens from model config (huggingface/transformers#30899) @zucchini-nlp

Reverted https://github.com/huggingface/transformers/commit/4ab7a28216211571fdddba414d4edd8426ab6489

v4.41.0: Phi3, JetMoE, PaliGemma, VideoLlava, Falcon2, FalconVLM & GGUF support

New models

Phi3

The Phi-3 model was proposed in Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone by Microsoft.

TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.

Phi-3 by @gugarosa in huggingface/transformers#30423

JetMoE

JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by Yikang Shen and MyShell. JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the ModuleFormer. Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy.

Add JetMoE model by @yikangshen in huggingface/transformers#30005

PaliGemma

PaliGemma is a lightweight open vision-language model (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

More than 120 checkpoints are released see the collection here !

... (truncated)

Commits

75f15f3 Release: v4.41.1
8282db5 Paligemma causal attention mask (#30967)
e5b788a Revert "feat: Upgrade Weights & Biases callback (#30135)"
9d05459 Generation: get special tokens from model config (#30899)
e5d174f PaliGemma - fix processor with no input text (#30916)
0414185 legacy to init the slow tokenizer when converting from slow was wrong (#30972)
6d2439a tokenizer_class = "AutoTokenizer" Llava Family (#30912)
4c6c45b Release: v4.41.0
e9a8041 update release script (#30880)
0a9300f Support arbitrary processor (#30875)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

String-sg / ai-starter-kit

Bump transformers from 4.36.0 to 4.41.1 #114

Release v4.41.1 Fix PaliGemma finetuning, and some small bugs

Release v4.41.1

Fix PaliGemma finetuning:

v4.41.0: Phi3, JetMoE, PaliGemma, VideoLlava, Falcon2, FalconVLM & GGUF support

New models

Phi3

JetMoE

PaliGemma