Open sherdencooper opened 11 months ago
We've been playing around with things a bit along these lines - will update this issue when we have something more concrete!
+1 - For discriminative models, bidirectional is the SOTA for a given parameter/size count (vs causal models). A bidirectional (MLM, ELECTRA etc' pretraining + expected input) would be amazing. Especially for Hyena DNA! (I have ideas on this front)
Check out Monarch Mixer for BERT-style models: https://github.com/HazyResearch/m2
Hi, thanks for this awesome work! I am wondering if this could be applied to Bert style model since the paper describe that hyena filter preserves causality in order to predict only depending on the past. I have read your HyenaDNA paper and am thinking about use Hyena in my project, which needs looking from both future and past. Thanks a lot in advance.