This repository contains tools and scripts for building a dataset. The process is outlined below:
Scraping Data: Utilize Gallery-dl to scrape the necessary JSON files. This tool is efficient in extracting data from a variety of online galleries and image hosting sites.
Processing Data:
Downloading: The final step involves the downloading process, where the cleaned, processed, and organized data is downloaded for use in your projects or analyses.
Be aware that the dataset building process can be very slow and may not be optimized for large-scale data scraping and processing. It is recommended to adjust your expectations accordingly and plan for potential delays in the dataset preparation phase.
The notebook provided in this repository is a comprehensive tool that guides you through each step of this process, making dataset building efficient and straightforward.