HazyResearch / safari

Convolutions for Sequence Modeling
Apache License 2.0
848 stars 70 forks source link

Does Hyena support BERT style LLM? #32

Open sherdencooper opened 11 months ago

sherdencooper commented 11 months ago

Hi, thanks for this awesome work! I am wondering if this could be applied to Bert style model since the paper describe that hyena filter preserves causality in order to predict only depending on the past. I have read your HyenaDNA paper and am thinking about use Hyena in my project, which needs looking from both future and past. Thanks a lot in advance.

DanFu09 commented 11 months ago

We've been playing around with things a bit along these lines - will update this issue when we have something more concrete!

ddofer commented 10 months ago

+1 - For discriminative models, bidirectional is the SOTA for a given parameter/size count (vs causal models). A bidirectional (MLM, ELECTRA etc' pretraining + expected input) would be amazing. Especially for Hyena DNA! (I have ideas on this front)

DanFu09 commented 9 months ago

Check out Monarch Mixer for BERT-style models: https://github.com/HazyResearch/m2