Modalities / modalities

Modalities, a PyTorch-native framework for distributed and reproducible foundation model training.
MIT License
63 stars 8 forks source link

Feat/coca #263

Open sthoduka opened 1 month ago

sthoduka commented 1 month ago

What does this PR do?

This PR updates the CoCa model so that it can be trained jointly on text-aligned images, audio and video. The webdataset-based dataset and loader are also included.

General Changes

Breaking Changes

Checklist before submitting final PR