-
### The model to consider.
Mamba Codestral: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1
Highlights:
- SOTA 7B code model
- theoretically unlimited context length; tested up to 256k
…
-
Hi @turboderp !
Would you be open to integrate exllamav2 library with HF transformers. The goal would be to make exl2 quantized model compatible with HF transformers using your kernels. We would si…
-
Related issue: https://github.com/microsoft/DeepSpeed/issues/5724#issuecomment-2330819411
But I tried the solution and found it didn't work in my setting.
**Describe the bug**
[rank1]: Traceback …
-
Hi folks,
As there are multiple issues here regarding fine-tuning DINOv2 on custom data, questions related to semantic segmentation/depth estimation, image similarity and feature extraction etc. th…
-
We have a TODO on allowing other lambda functions:
https://github.com/MESMER-group/mesmer/blob/8f8c8a06d299423997d9010617f734f830c497d4/mesmer/stats/_power_transformer.py#L256
E.g. logistic, con…
-
Hello Phil,
Do you mind how to inject the rotary positional embeddings into the [linear transformers](https://github.com/idiap/fast-transformers/blob/master/fast_transformers/attention/linear_atten…
-
I have installed transformer_engine for use with Accelerate and Ray. I have the following requirements which work totally fine for all sorts of distributed training
```text
torch==2.2.1
transform…
-
### System Info
Ubuntu 22.04 all latest versions
### Who can help?
@BenjaminBossan @sayakpaul
### Information
- [ ] The official example scripts
- [x] My own modified scripts
### Ta…
Oxi84 updated
3 weeks ago
-
I notice that passkey retrieval works well up to around 3-4k tokens. After that, it doesn't.
That wasn't my intuition for SSMs, but I guess context length is still related to the training set? It's…
-
I found with original training workflow, the loss is not decling, I am not sure this is because I am using a subset of the training set.
```
# File modified by authors of InstructDiffusion from …