State interpolation in python (pseudo-)code apparently does not match the ACT by Graves

EliasHasle commented 1 year ago

https://github.com/MostafaDehghani/Thesis/blob/7033ec5471584f1f422e33693dcdb51583a5eedb/04-part-03/chapter-06/figs_and_tables/alg_ut_with_act.tex#L91

Graves suggests approximating the final output (of a stochastically halting model) as a weighted average of states according to the stepwise halting probabilities.

Your code instead sequentially updates the output (and possibly even the internal state?) as a weighted average of the previous state and the newly generated one, according to a splitting by the newest stepwise halting probability.

The two are not at all equivalent. Was this code used for the experiments on the Universal Transformers?

Note that the update in this independent implementation of UT matches Graves: https://github.com/hncshp/universal-transformers-git/blob/master/universal_transformers.py#L753 While this (way more popular) one seems to match your paper: https://github.com/andreamad8/Universal-Transformer-Pytorch/blob/master/models/UTransformer.py#L286

MostafaDehghani commented 1 year ago

Thanks @EliasHasle for the comment. We also had what you described here in the code: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/research/universal_transformer_util.py#L1162

... and ran experiments with all different cases for UT: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/research/universal_transformer_util.py#L1045

EliasHasle commented 1 year ago

OK. I did not know that. Thanks. 😄 The different ACT modes are not mentioned in the version of the paper I read from arXiv. Did you discuss them in another version, or in your thesis?

MostafaDehghani commented 1 year ago

I had all the results of this ablation written in an internal doc and I'm not able to find it :/ But I remember that the version that made it to the paper/thesis was the best one across the board.

MostafaDehghani / Thesis

State interpolation in python (pseudo-)code apparently does not match the ACT by Graves #1