BAAI-DCAI / Bunny

A family of lightweight multimodal models.
Apache License 2.0
865 stars 65 forks source link

Can you provide the pretrain data filter script? #38

Closed qingyuanxingsi closed 5 months ago

qingyuanxingsi commented 5 months ago

Thanks for your great work! Can you provide the script used to filter raw data from LAION-2B?

Isaachhh commented 5 months ago

https://github.com/BAAI-DCAI/Dataset-Pruning/tree/main/LAION