attention-model Search Results

1000+ results
for attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Nixtla/mlforecast #440

How the training of global models realized?

### Description Hello, I am using mlforecast to train a global forecasting model and reaching an exciting performance. However, I have some questions about the training details of global models. Spec…

kkckk1110 updated 3 weeks ago
1
pytorch/pytorch #139424

`torch.compile` error on `scaled_dot_product_attention` in `…

### 🐛 Describe the bug Related: * https://github.com/pytorch/pytorch/issues/124289 * https://github.com/pytorch/pytorch/issues/109607 ```python """Demonstrate torch.compile error on transform…

ringohoffman updated 2 weeks ago
10
huggingface/peft #2200

RuntimeError: element 0 of tensors.. OpenCLIP model

### System Info peft = 0.13.2 python = 3.12.7 transformers = 4.45.2 ### Who can help? @sayakpaul I am using ```inject_adapter_model(...)``` to finetune a model from OpenCLIP using LoRA layers…

EngEmmanuel updated 2 weeks ago
4
jax-ml/jax #24934

[GPU] FlashAttention performance lags behind PyTorch

## Description I'm benchmarking naive FlashAttention in `Jax` vs. the Pallas's version of [`FA3`](https://github.com/jax-ml/jax/blob/7b9914d711593dca8725d46aa1dadb2194284519/jax/experimental/pallas…

neel04 updated 4 days ago
4
comfyanonymous/ComfyUI #4572

SDXL generate black images with new --fast arg

### Expected Behavior - ### Actual Behavior ![image](https://github.com/user-attachments/assets/1f9608dc-4631-41c3-bd2a-bfe506d39104) SD15 and Flux work fine, the problem is only with SDXL Co…

bananasss00 updated 1 day ago
15
huchenlei/ComfyUI-layerdiffuse #109

[Bug]: Error with SD15 Attention Injection when batch size =…

### What happened? I am using SD15. When the batch size on "Empty Latent Image" is set to 2, I get a CUDA error with `torch.nn.functional.scaled_dot_product_attention`from attention_sharing.py and …

Lia-C updated 1 month ago
1
daskol/lotr #2

Please publish end-to-end application example

Dear all, It would be great to see an end-to-end practical example of LoTR. By "practical" I mean that one takes, for example some existing LLM weights file, compresses it into a smaller weights fi…

dmikushin updated 1 month ago
3
airockchip/rknn-toolkit2 #206

T5-xxl Encoder 运行报错 `parseRKNN: exportDataSize large then mo…

如题 rknn-toolkit2版本 2.0.0b17 （更高版本转换时会报`invalid tensor malloc size, tensor name: , target: CPU, size: 0`这个错误） librknnrt.so版本2.2.0 导出onnx: ```python import torch from transformers import T…

happyme531 updated 1 week ago
1
NVIDIA/Megatron-LM #1151

[BUG] Context parallel gives NCCL error

**Describe the bug** I am using the `train_gpt3_175b_distributed.sh` script to launch training on a single node with 4 A100 80GB GPUs. Training goes well if I use tensor parallel or pipeline parallel,…

YJHMITWEB updated 1 day ago
1
huggingface/transformers #34238

GGUF support for BERT architecture

### Feature request I want to add the ability to use GGUF BERT models in transformers. Currently the library does not support this architecture. When I try to load it, I get an error TypeError: Ar…

Dimmension updated 1 month ago
1

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for attention-model

1000+ results
for attention-model