Closed SamuelCahyawijaya closed 5 months ago
Uhm, this dataset is very huge. It takes days (without parallelization) to load and download all the images locally. I've been running the loader since December 3rd, and it is not even finished by the time I submitted this comment. Currently trying to speed it up using parallelization. But I'm not sure how it will improve. Are you sure all that all of the images needs to be downloaded locally before it can be used?
Hi @IvanHalimP, sorry for the late reply. I am testing the dataloader right now and, as you mention it takes some time to generate the dataset. I will check if there is a way to speed up the process. I'll push some updates on this later this week.
Dataloader name:
cc3m_35l/cc3m_35l.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?cc3m_35l