-
I am following along the medium article ... but the MyDataCollator class now barfs because there is no more token in the additional_special_tokens of the processor.
Dropping the whole image part f…
-
Hi, thanks for your fantastic work! I have a question about the implementation of the voxel self-attention.
The paper writes ``These sampling points share the same height $z_k$, but with different …
-
Presently in transformer decoder, we do
```
h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask)
out = h + self.feed_forward.forward(self.ffn_norm(h))
```
We have c…
-
Hi
I wanna ask how to get predicted outputs (weight i, j) in Fig. 5.
Does it mean Softmax(Query j dot product Key i / dimension^0.5) or Softmax(Query j dot product Key i / dimension^0.5) Value i? bu…
-
尝试在星辰开源代码库中的modeling_telechat添加TelechatForSequenceClassification方法类(分别参照qwen和星辰自己代码),会分别出现无法加载模型的错误和训练损失不下降的情况。需要AI公司帮忙一起看看怎么支持AutoModelForSequenceClassification任务。
class TelechatForSequenceClassif…
tcoln updated
2 months ago
-
Hi, thanks for creating this package, it helps us to run whisper with tensorRT.
however, we found that is package didn't include a dependency map (usually is done by requirements.txt)
so we run wh…
-
准备好环境后sophon-demo/sample/StableDiffusionXL/scripts,执行./get_unet_bmodel.sh里的get_unet_bmodel.sh时,生成mlir成功了,但mlir到bmodel时出错了,具体报错信息如下:
[Running]: tpuc-opt unet_base_bm1684x_bf16_final.mlir --codegen="mo…
winca updated
2 weeks ago
-
I have build my own demo file. after uploading one video, it gives blank output. Could anyone help me out?
-------------------------------Here's the demo file-------------------------
from argpars…
BOYJZ updated
2 weeks ago
-
自注意力(Self-Attention) 機制
至目前為止,model用到的輸入皆可看為一個vector
但遇到更複雜的輸入時,像是輸入為一個sequence或是每次輸入長短不一的向量怎麼辦?
![image](https://user-images.githubusercontent.com/34474924/236625854-800b74f8-9ee9-4517-97b4-e3…
-
Hey, I have observed in my timing tests that version 2.6.3 is faster than some later commits (including 2.7.0.post2) for below input sizes. For example, for small batch sizes (==2) and relatively smal…