Open 1190300611 opened 1 year ago
yes, but I find the results is not like to the demo. The files have not images any. This is the downloaded files. Can anyone tell me if it's right?
Hi, this is right. The downloaded images are stored in a tar
file. And the *_stats.json
file provides information about the download status, including the total number of images, the number successfully downloaded, and the number of failures.
so can you tell me how translate the webdataset to your data format,thanks
下载 cc3m 数据集时,不断显示错误:'Field 'caption' does not exist in table schema'。
在查看 img2dataset 文档后,发现需要添加以下内容
pip install sed
sed -i '1s/^/caption\turl\n/' Train_GCC-training.tsv
img2dataset --url_list Train_GCC-training.tsv --input_format "tsv"\ --url_col "url" --caption_col "caption" --output_format webdataset\ --output_folder cc3m --processes_count 16 --thread_count 64 --image_size 256\ --enable_wandb True
你好,可以问下你的img2dataset版本吗。我使用的1.0.1版本下不下来
下载 cc3m 数据集时,不断显示错误:'Field 'caption' does not exist in table schema'。 在查看 img2dataset 文档后,发现需要添加以下内容
pip install sed
sed -i '1s/^/caption\turl\n/' Train_GCC-training.tsv
img2dataset --url_list Train_GCC-training.tsv --input_format "tsv"\ --url_col "url" --caption_col "caption" --output_format webdataset\ --output_folder cc3m --processes_count 16 --thread_count 64 --image_size 256\ --enable_wandb True
你好,可以问下你的img2dataset版本吗。我使用的1.0.1版本下不下来
The version of img2dataset I used is 1.45.0, that works fine.
下载 cc3m 数据集时,不断显示错误:'Field 'caption' does not exist in table schema'。 在查看 img2dataset 文档后,发现需要添加以下内容
pip install sed
sed -i '1s/^/caption\turl\n/' Train_GCC-training.tsv
img2dataset --url_list Train_GCC-training.tsv --input_format "tsv"\ --url_col "url" --caption_col "caption" --output_format webdataset\ --output_folder cc3m --processes_count 16 --thread_count 64 --image_size 256\ --enable_wandb True
你好,可以问下你的img2dataset版本吗。我使用的1.0.1版本下不下来
The version of img2dataset I used is 1.45.0, that works fine.
Thank you.
When downloading the cc3m dataset, an error is constantly displayed: 'Field "caption" does not exist in table schema'.
After reviewing the img2dataset document, it was found that the following needs to be added
pip install sed
sed -i '1s/^/caption\turl\n/' Train_GCC-training.tsv
img2dataset --url_list Train_GCC-training.tsv --input_format "tsv"\ --url_col "url" --caption_col "caption" --output_format webdataset\ --output_folder cc3m --processes_count 16 --thread_count 64 --image_size 256\ --enable_wandb True