This PR adds croco to visual encoders.
The original CrocoNet was a combined model containing encoder-decoder and masking logic. Here, it has been separated for the purpose of goal embedding caching and the masking logic has been removed as we will not be pretraining.
Additionally, we need a new binocular encoder for goal+obs embedding, which has been added in this PR.
To-do:
[ ] Test goal caching
[ ] Test cached sensor
[ ] Update or write a new policy which uses the binocular encoder embeddings instead of goal embeddings.
This PR adds croco to visual encoders. The original CrocoNet was a combined model containing encoder-decoder and masking logic. Here, it has been separated for the purpose of goal embedding caching and the masking logic has been removed as we will not be pretraining.
Additionally, we need a new binocular encoder for goal+obs embedding, which has been added in this PR.
To-do: