Star-Transformer - Githubissues

fastnlp / fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

https://gitee.com/fastnlp/fastNLP

Apache License 2.0

3.06k stars 450 forks source link

Star-Transformer #266

Closed JYlsc closed 4 years ago

JYlsc commented 4 years ago

你好，有几个关于Star-Transformer 的疑问想请教一下：

发现 _MSA1，_MSA2 中对于self attention 使用了conv2d 代替了dense，这个是出于速度的考虑还是具体会影响效果呢？论文中貌似没有提到
_MSA1 使用unfold实现了self attention ，而_MSA2则没有，出于什么考虑这么写呢？

yhcc commented 4 years ago

@QipengGuo 帮忙看看

QipengGuo commented 4 years ago

单纯是速度考量，pytorch本身没有一个直接的方法实现 “在滑动窗口做attention”这个事情。MSA2是one-to-many，一个query到很多key，所以可以直接做。当然用unfold还是一个妥协的方法，我后面做了一种更快的方法，但要手写cuda kernel，伪代码在另一个工作中有提到。