Implementation of FAIR's Data2Vec2.0 for vision. (Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Uses @lucidrains x-transfromers (https://github.com/lucidrains/x-transformers)
Basic Schematic of Architecture:
In order to run, just run: python train.py
on your command line
TODO: