IDEA-Research / HumanSD

[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
Apache License 2.0
275 stars 18 forks source link

The mapping file does not match downloaded Laion-aesthetic #4

Closed soon-yau closed 1 year ago

soon-yau commented 1 year ago

I follow the instructions to download Laion-aesthetic V1. However, I found that the images do not match the provided mapping_file_training.json. For example, 00040_000400060, I downloaded the following image with text prompt "mother of the bride hairstyles: woman with sleek blown out hair and a headband" but mapping_file_training.json says "Understanding Urinary Tract Infections". 000400060

Can you provide the script used to download Laion-aesthetic?

juxuan27 commented 1 year ago

Hi, @soon-yau !

Thank you for your issue. I directly use the tar file of Laion-Aesthetic downloaded by others. And I didn't notice the difference between the processed tar file and the original Laion-Aesthetic. So sorry for all the troubles. We have already upload the images' .parquet to google drive and you can refer to the "key" in .parquet as image index. For example, 00033.parquet have a image with key 338717, then the image is images/00000/000338717.jpg and the pose is pose/00000/000338717.npz. I will revise the readme afterwards.

soon-yau commented 1 year ago

I have looked at your parquet, that matches the mapping json but different from my laion-aesthetic copy. My guess is that the indices are generated out of order when the script download images from webpages, some of the images are no longer available. Meaning your 000338717.jpg may not be the same as my 000338717.jpg. If that is true, I'm afraid your dataset is unusable unless it is released together with the images.

juxuan27 commented 1 year ago

Yes, so we release the images through .parquet. You can directly download the images through .parquet since it contain the image urls. Moreover, may I ask how you get your image index (cause I do not find image index in original Laion-Aesthic)? I can try to fix the order and make the whole dataset more handy.

orpatashnik commented 1 year ago

I have the same issue as @soon-yau , and I get the same image he attached for 00040_000400060 (I downloaded today). Where is the .parquet that contains the URL for the images? I didn't find it in the google drive directory.

juxuan27 commented 1 year ago

@orpatashnik. I have update the google drive directory yesterday, please check the original google link. For your convenience, the link is also provided here: https://drive.google.com/drive/folders/1aklUHxlhgLcyZrTpuhpxvTkMx3-vmQOY?usp=share_link. May I ask how you download the images? Cause I do not find image index in the original version of Laion-Aesthetic.

orpatashnik commented 1 year ago

Thanks!

I followed the instructions here https://github.com/rom1504/img2dataset/blob/main/dataset_examples/laion-aesthetic.md to download the dataset

juxuan27 commented 1 year ago

I will try to figure out the difference in image index and fix it as soon as possible. Before that, please first use .parquet file to download images as provided. Sorry for the trouble!

soon-yau commented 1 year ago

As Or is getting the same image as me, so I guess our dataset download were 'correct'. There should be a standardized and simple way for the users to download the dataset like a download script that handle file naming, multitasking and error handling. An alternative is to for your team to re-download and re-process the dataset.

juxuan27 commented 1 year ago

I will try to fix this problem by re-download the dataset. Thank you for your issue

juxuan27 commented 1 year ago

We have release the code of downloading images corresponding with pose sequence order. Please refer to README for more details.