Closed soon-yau closed 1 year ago
Hi, @soon-yau !
Thank you for your issue. I directly use the tar file of Laion-Aesthetic downloaded by others. And I didn't notice the difference between the processed tar file and the original Laion-Aesthetic. So sorry for all the troubles. We have already upload the images' .parquet to google drive and you can refer to the "key" in .parquet as image index. For example, 00033.parquet have a image with key 338717, then the image is images/00000/000338717.jpg and the pose is pose/00000/000338717.npz. I will revise the readme afterwards.
I have looked at your parquet, that matches the mapping json but different from my laion-aesthetic copy. My guess is that the indices are generated out of order when the script download images from webpages, some of the images are no longer available. Meaning your 000338717.jpg may not be the same as my 000338717.jpg. If that is true, I'm afraid your dataset is unusable unless it is released together with the images.
Yes, so we release the images through .parquet. You can directly download the images through .parquet since it contain the image urls. Moreover, may I ask how you get your image index (cause I do not find image index in original Laion-Aesthic)? I can try to fix the order and make the whole dataset more handy.
I have the same issue as @soon-yau , and I get the same image he attached for 00040_000400060 (I downloaded today). Where is the .parquet that contains the URL for the images? I didn't find it in the google drive directory.
@orpatashnik. I have update the google drive directory yesterday, please check the original google link. For your convenience, the link is also provided here: https://drive.google.com/drive/folders/1aklUHxlhgLcyZrTpuhpxvTkMx3-vmQOY?usp=share_link. May I ask how you download the images? Cause I do not find image index in the original version of Laion-Aesthetic.
Thanks!
I followed the instructions here https://github.com/rom1504/img2dataset/blob/main/dataset_examples/laion-aesthetic.md to download the dataset
I will try to figure out the difference in image index and fix it as soon as possible. Before that, please first use .parquet file to download images as provided. Sorry for the trouble!
As Or is getting the same image as me, so I guess our dataset download were 'correct'. There should be a standardized and simple way for the users to download the dataset like a download script that handle file naming, multitasking and error handling. An alternative is to for your team to re-download and re-process the dataset.
I will try to fix this problem by re-download the dataset. Thank you for your issue
We have release the code of downloading images corresponding with pose sequence order. Please refer to README for more details.
I follow the instructions to download Laion-aesthetic V1. However, I found that the images do not match the provided mapping_file_training.json. For example, 00040_000400060, I downloaded the following image with text prompt "mother of the bride hairstyles: woman with sleek blown out hair and a headband" but mapping_file_training.json says "Understanding Urinary Tract Infections".
Can you provide the script used to download Laion-aesthetic?