linear-transformer Search Results

1000+ results
for linear-transformer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

BashirHilaly/GPT-2-EssayWriter #2

Issue in running

File "main.py", line 9, in from transformers import AdamW, WarmUp, get_linear_schedule_with_warmup ImportError: cannot import name 'WarmUp' from 'transformers' (/home/user/.local/lib/python3.8/…

Midnight1938 updated 4 years ago
1
tencent-ailab/PCDMs #24

Problems with the Weight of resolution 256 stage2

The released stage 2 weight of resolution 256 seems to be incomplete, the error log is shown below. ` File "/home/user/data/PT/PCDMs/stage2_batchtest_inpaint_model.py", line 126, in inference …

LOSKIw updated 1 month ago
6
karpathy/minGPT #135

What is the purpose of `c_proj` here?

https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L42 Why do we need an additional linear transformation after the MHA and before the MLP when the dim…

brynhayder updated 5 months ago
1
young-geng/EasyLM #98

OOM trying to pretrain llama 7b on v4-256

Command ```sh python -m EasyLM.models.llama.llama_train \ --mesh_dim='-1,32,1' \ --dtype='fp32' \ --total_steps=250000 \ --log_freq=50 \ --save_model_freq=0 \ --sav…

redbrain updated 2 months ago
7
huggingface/transformers #33467

Support context parallel training with ring-flash-attention

### Feature request Hi, I'm the author of [zhuzilin/ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention). I wonder if you are interested in integrating context parallel with [zh…

zhuzilin updated 3 days ago
4
facebookresearch/vissl #564

MLP_DIM in ViT is not adjustable through the config.

## Instructions To Reproduce the Issue: `Dino` configuration contains the parameter named `MLP_DIM` which looks like being adjustable by the user. But actually, it is hard coded. See the line here ht…

Jeffkang-94 updated 11 months ago
2
unslothai/unsloth #1101

Getting CUDA OOM on training gemma-2-2b with "lm_head" and "…

Hi @danielhanchen I am trying to fine-tune gemma2-2b for my task following the guidelines of the continued finetuning in unsloth. Howver, I am facing OOM while doing so. My intent is to train gemm…

InderjeetVishnoi updated 1 day ago
3
DeepWok/mase #201

Hardware Regression Test

We are refactoring the regression tests under the [fix/tests](https://github.com/DeepWok/mase/tree/fix/tests) branch. On the hardware side, we observed the following errors. Due to the large number of…

jianyicheng updated 2 months ago
1
aai-institute/continuiti #40

Linear Attention

# Description Current challenges in using Neural Operators are: irregular meshes, multiple inputs, multiple inputs on different meshes, or multi-scale problems. [1] The Attention mechanism is promi…

JakobEliasWagner updated 6 months ago
1
lightonai/pylate #3

Leverage existing projection layers from ST models

Right now, when initializing from a ST checkpoints, we chop-off the eventual "Dense" module. Although these checkpoints require training anyways, this layer can be a good initialization for the linea…

NohTow updated 1 month ago
1

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for linear-transformer

1000+ results
for linear-transformer