Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
242 stars 25 forks source link

Implement DINO strategy for learning. #203

Closed brunosan closed 1 month ago

brunosan commented 3 months ago

This PR changes the learning method (we do not change the architecture or outputs) from using the MAE (Masked Autoencoder) to the DINO (Distillation with No Labels) approach.

Background on MAE: MAE operates on the principle of masking a significant portion of the input data (typically 75%) and training the model to reconstruct these missing parts. This approach encourages the model to learn representations based on the context provided by the unmasked portions, leveraging transformer technology to generate detailed embeddings for each data patch. In scenarios where unique features are isolated within single patches, it might not always effectively infer their presence.

DINO: DINO shifts the focus from reconstruction to a student-teacher framework (two models running in parallel). Here, the "student" model learns to replicate the output of the "teacher" model, which itself is an aggregate of the student model's past iterations. This method emphasizes learning from the entirety of the input data, as opposed to focusing on the missing parts, aiming to refine the model's understanding and representation capabilities.

Key Differences and Advantages:

Patch-Level Embeddings: Both MAE and DINO generate detailed embeddings at the patch level, but DINO is able to capture more nuanced patterns within and around each patch, informed by the accumulated tries of the teacher model.

DINO downsides:

Currently running a small experiment over Bali with DINO and then I'll do same with MAE and compare runs.

brunosan commented 3 months ago
Screenshot 2024-04-03 at 10 14 58

Promising training.