Mikolaj / horde-ad

Higher Order Reverse Derivatives Efficiently - Automatic Differentiation library based on the paper "Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation"
BSD 3-Clause "New" or "Revised" License
31 stars 6 forks source link

Implement LSTM with both soft and hard attention #50

Open Mikolaj opened 2 years ago

Mikolaj commented 2 years ago

Implements a couple of variants of LSTM. Then let's create some example neural networks with that (e.g., modify the existing MNIST RNN to use LSTM). If not and if no libraries offer a good alternative, let's implement our own. Two good blog posts about LSTM: https://jasdeep06.github.io/posts/Understanding-LSTM-in-Tensorflow-MNIST and https://colah.github.io/posts/2015-08-Understanding-LSTMs.

This should be fun and not too hard, but understanding and perhaps tweaking our MNIST RNN first could help.

See #41 and https://github.com/Mikolaj/mostly-harmless/discussions/16?sort=new#discussioncomment-2811053. In particular, let's implement both soft and hard attention, if possible, and see if that trains.

Mikolaj commented 2 years ago

An alternative is to implement transformers that seem to be more in favour these days. A good source of wisdom about that is https://github.com/awf/functional-transformer

Mikolaj commented 2 years ago

@fkrawiec, actually, I've done all the little examples I could think of, and this one now remains (transformers, after Andrew, not LSTM) and it's probably not little. Let me know if you'd like to team up and I'd quickly do a modest version of #56 and we'd have a stable API and a design pattern to work with.