-
File "main.py", line 9, in
from transformers import AdamW, WarmUp, get_linear_schedule_with_warmup
ImportError: cannot import name 'WarmUp' from 'transformers' (/home/user/.local/lib/python3.8/…
-
The released stage 2 weight of resolution 256 seems to be incomplete, the error log is shown below.
`
File "/home/user/data/PT/PCDMs/stage2_batchtest_inpaint_model.py", line 126, in inference
…
-
https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L42
Why do we need an additional linear transformation after the MHA and before the MLP when the dim…
-
Command
```sh
python -m EasyLM.models.llama.llama_train \
--mesh_dim='-1,32,1' \
--dtype='fp32' \
--total_steps=250000 \
--log_freq=50 \
--save_model_freq=0 \
--sav…
-
### Feature request
Hi, I'm the author of [zhuzilin/ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention).
I wonder if you are interested in integrating context parallel with [zh…
-
## Instructions To Reproduce the Issue:
`Dino` configuration contains the parameter named `MLP_DIM` which looks like being adjustable by the user. But actually, it is hard coded. See the line here ht…
-
Hi @danielhanchen
I am trying to fine-tune gemma2-2b for my task following the guidelines of the continued finetuning in unsloth. Howver, I am facing OOM while doing so. My intent is to train gemm…
-
We are refactoring the regression tests under the [fix/tests](https://github.com/DeepWok/mase/tree/fix/tests) branch. On the hardware side, we observed the following errors. Due to the large number of…
-
# Description
Current challenges in using Neural Operators are: irregular meshes, multiple inputs, multiple inputs on different meshes, or multi-scale problems. [1] The Attention mechanism is promi…
-
Right now, when initializing from a ST checkpoints, we chop-off the eventual "Dense" module.
Although these checkpoints require training anyways, this layer can be a good initialization for the linea…