linear-attention-model Search Results

1000+ results
for linear-attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #117209

torch.onnx.export the torch,nn.TransformerEncoderLayer, the …

### 🐛 Describe the bug # the bug description When I use the tensor shape like this (batch_size, seq_len, embedding_size) and i put the batch_size, seq_len in dynamic_axes to generate onnx model, I c…

yinpeilin updated 1 month ago
3
facebookresearch/DiT #82

Clarification on Zero Initialization in FinalLayer of DiT Mo…

Hello Facebook Research Team, I am exploring the DiT as implemented in your repository and came across the weight initialization strategy for the FinalLayer, particularly observed in [this section …

denemmy updated 4 months ago
2
apple/coremltools #2278

Model state doesn't work with transpose

## 🐞Describing the bug This is a follow up on https://github.com/apple/coremltools/issues/2275. Sorry I couldn't find the reopen option in the original issue. To clarify, the issue didn't happen wit…

seayoung1112 updated 2 months ago
2
huggingface/transformers #25419

Abnormally High GPU Memory Consumption with OPT 350M Model L…

### System Info - `transformers` version: 4.32.0.dev0 - Platform: Linux-5.4.0-135-generic-x86_64-with-glibc2.35 - Python version: 3.11.4 - Huggingface_hub version: 0.16.4 - Safetensors version: 0…

ayaka14732 updated 1 month ago
10
da03/Attention-OCR #71

module 'tensorflow.contrib.rnn.python.ops.rnn_cell' has no a…

File "......./attention-OCR-master/src/model/seq2seq.py", line 75, in linear = rnn_cell._linear # pylint: disable=protected-access

dongdql updated 2 years ago
15
cientgu/InstructDiffusion #24

LOSS is not declinig

I found with original training workflow, the loss is not decling, I am not sure this is because I am using a subset of the training set. ``` # File modified by authors of InstructDiffusion from …

YerongLi updated 1 month ago
1
tencent-ailab/IP-Adapter #377

Basic understanding of the IP adapter during image generatio…

Hey everyone, I'm trying to understand the IP adapter better. Maybe someone can help me:) Paper: https://arxiv.org/pdf/2308.06721.pdf Would it be right to say: 1)An IP adapter model(e.g. i…

StableQuestion updated 4 months ago
1
NVIDIA/NeMo #10280

dim unmatch when doing sft with tensor parallel and sequence…

**Describe the bug** I was training to run sft based on Mixtral-8x7B-instruct model with tensor parallel size=4 (sequence parallel=True) and LoRA (target modules =[all]). It reports that the output …

zhuango updated 1 day ago
2
erfanzar/EasyDeL #170

Nan losses with Gemma 1 DPO training on Kaggle TPU

Nan losses when training: ![image](https://github.com/user-attachments/assets/78126797-27e6-433c-91bb-cf8260302e6c) Please take a look at this code: ``` !pip install jax[tpu]==0.4.28 -f https:…

defdet updated 3 days ago
3
OpenBMB/CPM-Bee #58

单卡微调，没有输出微调模型

微调命令：torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py --use-delta --model-config config/cpm-bee-10b.json --dataset ../tutorial…

ivancr7 updated 1 year ago
6

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for linear-attention-model

1000+ results
for linear-attention-model