bsz Search Results - Githubissues

1000+ results
for bsz

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

j-min/MoChA-pytorch #2

implementation of `safe_cumprod`

cumprod in the MoChA paper is defined to be exclusive, while the `safe_cumprod` in this repo does not. Shouldn't it be: ```python def safe_cumprod(self, x, exclusive=False): """Numerically st…

bo-son updated 4 years ago
1
gbv/jskos-data #38

Add SfB and ASB

See bachelor thesis https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-211701 to follow on.

nichtich updated 1 year ago
3
baichuan-inc/Baichuan2 #209

设置output_attentions=True无法返回attn_weights

尝试输出模型在推理过程中中间层的attention值 ```python text = '你好' inputs = tokenizer_baichuan(text, return_tensors='pt', return_token_type_ids=False) out = model_baichuan(**inputs, output_attentions=True) ``` 报错…

shutttttdown updated 6 months ago
3
Link-Li/Balanced-DataParallel #11

error

File "/data/lw/2Dxiangao/data_parallel_my_v2.py", line 89, in scatter bsz = inputs[0].size(self.dim) AttributeError: 'list' object has no attribute 'size'

li-wei-21 updated 1 year ago
2
microsoft/torchscale #85

Introducing padding_mask to RetNet

As opposed to the other architectures in this package, RetNet doesn't have support for padding as far as I'm aware. I was thinking the best place to introduce it was along with the positional mask. He…

xtwigs updated 9 months ago
2
karpathy/llama2.c #427

I added bidirectional attention, and those who need it can s…

Modify the original attention ``` class Attention(nn.Module): def __init__(self, args: ModelArgs): super().__init__() self.n_kv_heads = args.n_heads if args.n_kv_heads is None…

win10ogod updated 6 months ago
5
intel-analytics/ipex-llm #4658

Nano : invalid dimensions when accelerate with onnx

My calib dataloader is bsz = 1, after I quantize my model by ``` model_int8 = trainer.quantize(model, accelerator='onnxruntime', calib_dataloader=train_dl, method='int…

rnwang04 updated 2 years ago
2
huggingface/distil-whisper #65

What are the settings used for WER calculation in the paper?

Did you compare Whisper-large2 and distil-whisper on Transformers default settings (beam-size = 1, temperature = 1, do_sample = False)? What would be the difference if you've used Open-ai settings …

hidoba updated 8 months ago
1
Infini-AI-Lab/TriForce #10

Attention Scores Matrix Visualization

Hi, I would like to ask why the attention mask is not used in the prefill stage. I want to output the attention scores matrix in prefill stage. Is the code below right? ``` if spec: # s…

bulaikexiansheng updated 1 week ago
1
ubtue/tuefind #2072

Verweis auf DSpace-Datei anzeigen

Nach einem erfolgreichen Upload zeigen wir aktuell im Titelsatz statt dem "Publish"-Button einen anderen Button an: "go to file". ![grafik](https://user-images.githubusercontent.com/26873381/163338…

mtrojan-ub updated 1 year ago
2

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for bsz

1000+ results
for bsz