cybertronai / gradient-checkpointing

Make huge neural nets fit in memory
MIT License
2.73k stars 271 forks source link

A few more citations #19

Open davidBelanger opened 6 years ago

davidBelanger commented 6 years ago

This is a great package! Thanks for making it available.

FYI, your README should cite a few more works:

Zweig, Geoffrey and Padmanabhan, Mukund. Exact Alpha-Beta Computation in Logarithmic Space with Application to MAP Word Graph Construction. Sixth International Conference on Spoken Language Processing, 2000.

Lewis, Bil. Debugging Backwards in Time. arXiv preprint cs/0310016, 2003.

yaroslavvb commented 6 years ago

Would you like to send a PR for that?

Also, Griewank describes similar memory saving techniques in his AD work in the 1990s, they are summarized in Evaluating derivatives: principles and techniques of algorithmic differentiation