linear-attention-model Search Results

1000+ results
for linear-attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

black-forest-labs/flux #146

unet and image_encoder in FluxPipeline

Dear Team Thank you so much for releasing the model.. I am trying to integrate the flux model for some use case for which I requires the unet, and image_encoder. I find in the FluxPipeline there exi…

arushijain45 updated 1 week ago
1
rmin2000/WaDiff #2

Some question about this error

I use 5,000 256x256 images to train on first step with this command `python train.py --data_dir ../dataset256/dataset256 \ --bit_length 48 --image_resolution 256 --num_epochs 100 --data_size 5000 …

moonfalling updated 3 weeks ago
1
huggingface/accelerate #3041

infer_auto_device_map inefficiently allocates GPU memory for…

### System Info ```Shell - `Accelerate` version: 0.33.0 - Platform: Windows-10-10.0.22631-SP0 - `accelerate` bash location: C:\Users\Nech\anaconda3\envs\transformer-multi-device\Scripts\accelera…

Nech-C updated 2 weeks ago
7
LLaVA-VL/LLaVA-NeXT #285

Pretrain checkpoint inference

I pretrain with script ``` torchrun --nproc_per_node="${NUM_GPUS}" --nnodes="${NNODES}" \ "./llava/train/train_mem.py" \ --model_name_or_path ${LLM_VERSION} \ --version ${PROMPT_VERSI…

baochi0212 updated 2 days ago
1
huggingface/trl #2215

[GKD] mismatch in tensors when stacking log probs

### System Info Latest TRL from source, can't run TRL env rn as cluster is shut down but I'm installing everything from source. If required will restart cluster and run. ### Information - [ ] Th…

nivibilla updated 6 hours ago
3
pytorch/pytorch #126654

torch.nn.functional.scaled_dot_product_attention returns NaN…

### 🐛 Describe the bug When using `torch.nn.functional.scaled_dot_product_attention` with autograd a tensor filled with NaN values are returned after a few backward passes. `Using torch.autograd.s…

daniel-padban updated 4 months ago
5
kvcache-ai/ktransformers #49

More Efficient Layer Distribution for DeepSeek Coder v2 on M…

Hi, I'm currently trying to run DeepSeek Coder v2 on a single node with the following setup: Node 1: Two A6000 GPUs (48GB each) and 192GB of RAM Node 2: Two 4090 GPUs (24GB each) and 64GB …

BGFGB updated 1 month ago
4
zhongkaifu/Seq2SeqSharp #88

Add training and inference support for RWKV LSTMs

**Is your feature request related to a problem? Please describe.** Your Seq2SeqSharp project already support LSTMs. Please consider to implement the RWKV large language "linear attention" idea into y…

TodayAI updated 3 months ago
1
unslothai/unsloth #796

TypeError: LlamaRotaryEmbedding.__init__() got an unexpected…

Hi there, so I am loading a finetuned Llama 2 13b model, and I get this error. Here's part of the error: File /usr/local/lib/python3.10/dist-packages/unsloth/models/loader.py:172, in FastLanguag…

DaddyCodesAlot updated 1 week ago
11
jiaweizzhao/GaLore #58

Zero Loss: The algorithm failed to converge because the inpu…

Hi GaLore Team, congratulations for the interesting work! I am trying to fine-tune llama-3 8B model using GaLore but getting this error: `torch._C._LinAlgError: linalg.svd: The algorithm failed to…

akjindal53244 updated 1 month ago
1

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for linear-attention-model

1000+ results
for linear-attention-model