We will use 80TB to keep dataset for multimodal dataset (5B files of LAION image dataset will use 50TB [reference]) and 30TB for other multimodal dataset (COYO-700M, Conceptual 12M and other datasets).
We will use 40TB for our multimodal experiments (Keep weights, keep preprocess or cleaned data)
It should have