Slot attention module question

SlotAttention (SA) can be taken as a TransformerBlock(TFB)/TransformerEncoderLayer variant. But here in SAVi model, SA is just used as the object feature extractor/aggregator, while TFB works as the dynamics/transition model, like in Reinforcement Learning World Models, which processes current state (a set of slots here) and predict the future state (a new set of slots here). Namely, at least in SAVi, SA and TFB are two sequential modules with different functionalities. Still, if you really want to compare them, then SA is MultiheadAttention+GRU and iteration, where key = value = image features and query = slots (object features), while TFB is MultiheadAttention+FFN/MLP, where query = key = value = slots. So if you really want to replace SA with TFB, then you have to add a GRU into TFB to include the intermediate state of the iteration, and reorganize the inputs of query/key/value as those in SA inputs.
AMP -> larger batch size -> larger GPU actual FLOPS; high dataset compression -> 5x less disk I/O overhead; PyTorch DataLoader is faster than TensorFlow Dataset (tfds) (actually by removing the shit code in tfds, TF Dataset itself can be even much faster). I didn't change anything in the SA implementation -- I just rewrite the original TensorFlow implementation in PyTorch.

Genera1Z / SAVi-PyTorch