-
cumprod in the MoChA paper is defined to be exclusive, while the `safe_cumprod` in this repo does not. Shouldn't it be:
```python
def safe_cumprod(self, x, exclusive=False):
"""Numerically st…
-
See bachelor thesis https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-211701 to follow on.
-
尝试输出模型在推理过程中中间层的attention值
```python
text = '你好'
inputs = tokenizer_baichuan(text, return_tensors='pt', return_token_type_ids=False)
out = model_baichuan(**inputs, output_attentions=True)
```
报错…
-
File "/data/lw/2Dxiangao/data_parallel_my_v2.py", line 89, in scatter
bsz = inputs[0].size(self.dim)
AttributeError: 'list' object has no attribute 'size'
-
As opposed to the other architectures in this package, RetNet doesn't have support for padding as far as I'm aware. I was thinking the best place to introduce it was along with the positional mask. He…
-
Modify the original attention
```
class Attention(nn.Module):
def __init__(self, args: ModelArgs):
super().__init__()
self.n_kv_heads = args.n_heads if args.n_kv_heads is None…
-
My calib dataloader is bsz = 1, after I quantize my model by
```
model_int8 = trainer.quantize(model, accelerator='onnxruntime',
calib_dataloader=train_dl, method='int…
-
Did you compare Whisper-large2 and distil-whisper on Transformers default settings (beam-size = 1, temperature = 1, do_sample = False)?
What would be the difference if you've used Open-ai settings …
-
Hi, I would like to ask why the attention mask is not used in the prefill stage.
I want to output the attention scores matrix in prefill stage. Is the code below right?
```
if spec: # s…
-
Nach einem erfolgreichen Upload zeigen wir aktuell im Titelsatz statt dem "Publish"-Button einen anderen Button an: "go to file".
![grafik](https://user-images.githubusercontent.com/26873381/163338…