-
Hi, because most people do not have suitable GPU colab file is important. Current colab notebook run gives following error below. It seems because of python version which currently is 3.7 and async i…
-
SATRN.py의 TransformerDecoderLayer의 forward()부분에서
https://github.com/bcaitech1/p4-fr-9-googoo/blob/f8ee504c37e57fb29eebb19d441feb18dc79c1df/networks/SATRN.py#L444
이 부분의 tgt를 out으로 바꿔야 할 것 같습니다.
…
-
# Reference
- [ ] [paper - 2019 - A Simple and Strong Convolutional-Attention Network for Irregular Text Recognition](https://arxiv.org/pdf/1904.01375v3.pdf)
- [ ] [paper - 2017 - Attention Is All Y…
-
## Abstract
- Propose `Average Attention Network` module that serves as decoder for Transformer. Decoding speed improves x3~4 while preserving translation performance.
- Empirical evidence shown in …
-
**Describe the bug**
I was training to run sft based on Mixtral-8x7B-instruct model with tensor parallel size=4 (sequence parallel=True) and LoRA (target modules =[all]).
It reports that the output …
-
我使用ALBERT和孪生网络来训练一个主观问题评分模型,训练策略参考的你的代码,孪生网络由双向LSTM和全连接层组成。在训练中,我发现准确率没有提高,一直保持不变。我感觉像是权重没有更新,可能是因为梯度太小导致了权重变化不大。或者,训练策略可能存在问题,但我不确定具体原因。下面是我训练期时的准确率:
![training](https://github.com/dragen1860/MAML-P…
-
### Checklist
- [ ] The issue exists after disabling all extensions
- [X] The issue exists on a clean installation of webui
- [ ] The issue is caused by an extension, but I believe it is caused by a …
-
Traceback (most recent call last):
File "demo.py", line 16, in
model = create_model(opt)
File "/viton/Global-Flow-Local-Attention/model/__init__.py", line 32, in create_model
instance…
-
Hello Ma,
I am trying to run your code but it requires
- 'allx',
- 'ally',
- 'graph',
- "adjmat",
- "trainMask",
- "valMask",
- "testMask"
I checked the preprocessing files and …
-
### 🐛 Describe the bug
I'm trying to finetune Llama2-7B (to reproduce the experiments in a paper) using PEFT LoRA (0.124% of trainable params). However, this results in an out-of-memory (OOM) error o…