LAION-AI / Big-Interleaved-Dataset

Big-Interleaved-Dataset
Apache License 2.0
58 stars 8 forks source link

BILD Phase 2 #3

Open harry-stark opened 2 years ago

harry-stark commented 2 years ago

Phase 2 pipeline will deal with various filtering steps required with extracted data. Will add more descriptions soon.

Some initial resources for filtering data sources:https://github.com/StevenBlack/hosts