LAION-dedup
The goal of this repository is to provide different tools for deduplication, which can be used to:
- Know how many unique images in the dataset
- Reduce memorization in text to image models
- Increase training efficiency
- Study the impact of the pre-training data's near-duplicates on the performance of downstream tasks
Resources
Software
Datasets for evaluation