Best practices for managing v0

webdataset is looking pretty slick.

The specification is dead simple

https://docs.google.com/document/d/18OdLjruFNX74ILmgrdiCI9J1fQZuhzzRBCHV9URWto0/edit

The tar format ensures compatibility across platforms and allows for easy
creation, manipulation, and extraction of WebDataset files using standard tools.
The file naming conventions enable the grouping of related files into individual
data samples, identified by unique prefixes within the archive. This addresses
the "small file problem" common in deep learning, optimizing I/O and storage
utilization.

Benchmarking shows WDS just slightly behind TFDS.

https://github.com/huggingface/pytorch-image-models/discussions/1524#discussioncomment-4008520

But large datasets with TFDS seem a lot more complex. Requires using Apache Beam? https://www.tensorflow.org/datasets/beam_datasets#implementing_a_beam_dataset

ManifoldRG / MultiNet

Best practices for managing v0 #61