linear-transformer Search Results

1000+ results
for linear-transformer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #6479

[New Model]: Codestral Mamba

### The model to consider. Mamba Codestral: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1 Highlights: - SOTA 7B code model - theoretically unlimited context length; tested up to 256k …

K-Mistele updated 2 months ago
1
turboderp/exllamav2 #461

Integration with Hugging Face transformers library

Hi @turboderp ! Would you be open to integrate exllamav2 library with HF transformers. The goal would be to make exl2 quantized model compatible with HF transformers using your kernels. We would si…

SunMarc updated 4 months ago
5
microsoft/DeepSpeed #6549

[BUG] CUDA error: no kernel image is available for execution…

Related issue: https://github.com/microsoft/DeepSpeed/issues/5724#issuecomment-2330819411 But I tried the solution and found it didn't work in my setting. **Describe the bug** [rank1]: Traceback …

getao updated 6 days ago
4
facebookresearch/dinov2 #153

DINOv2 is now available in HF Transformers (with tutorial)

Hi folks, As there are multiple issues here regarding fine-tuning DINOv2 on custom data, questions related to semantic segmentation/depth estimation, image similarity and feature extraction etc. th…

NielsRogge updated 1 week ago
23
MESMER-group/mesmer #517

allow different lambda functions (besides logistic)?

We have a TODO on allowing other lambda functions: https://github.com/MESMER-group/mesmer/blob/8f8c8a06d299423997d9010617f734f830c497d4/mesmer/stats/_power_transformer.py#L256 E.g. logistic, con…

mathause updated 3 weeks ago
1
lucidrains/x-transformers #34

[Question!] How to Inject Rotary Positional Embeddings in Li…

Hello Phil, Do you mind how to inject the rotary positional embeddings into the [linear transformers](https://github.com/idiap/fast-transformers/blob/master/fast_transformers/attention/linear_atten…

gaceladri updated 3 years ago
5
NVIDIA/TransformerEngine #856

Cannot import and use transformer_engine after successful in…

I have installed transformer_engine for use with Accelerate and Ray. I have the following requirements which work totally fine for all sorts of distributed training ```text torch==2.2.1 transform…

sam-h-bean updated 3 months ago
4
huggingface/peft #2054

Problem with model.merge_and_unload - the saved model is al…

### System Info Ubuntu 22.04 all latest versions ### Who can help? @BenjaminBossan @sayakpaul ### Information - [ ] The official example scripts - [x] My own modified scripts ### Ta…

Oxi84 updated 3 weeks ago
3
state-spaces/mamba #140

Is Context Length dependent on training data's context?

I notice that passkey retrieval works well up to around 3-4k tokens. After that, it doesn't. That wasn't my intuition for SSMs, but I guess context length is still related to the training set? It's…

RonanKMcGovern updated 1 month ago
6
cientgu/InstructDiffusion #24

LOSS is not declinig

I found with original training workflow, the loss is not decling, I am not sure this is because I am using a subset of the training set. ``` # File modified by authors of InstructDiffusion from …

YerongLi updated 1 month ago
1

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for linear-transformer

1000+ results
for linear-transformer