Open QQQfive opened 10 months ago
In the first stage, the paper mentions the use of a 200GB dataset, but why does the actual code involve downloading a 2.3TB dataset?
We use Laion-aesthetic from the LAION-5B dataset for stage 1, which amounts to approximately 200GB for the first 302 tar files.
In the first stage, the paper mentions the use of a 200GB dataset, but why does the actual code involve downloading a 2.3TB dataset?