HaozheZhao / UltraEdit

180 stars 9 forks source link

About the different version of models and datasets. #15

Open syguan96 opened 3 months ago

syguan96 commented 3 months ago

Hi @HaozheZhao, this is a great work. I tried to filter out some categories to train Instructpix2pix.

I noticed that you have released "UltraEdit_500k", "UltraEdit_Segion-Based_100k", and the complete dataset. Can you tell me how to divide these subsets? If possible, could you tell me the difference between "BleachNick/SD3-UltraEdit_freeform", "BleachNick · SD3-UltraEdit w_mask", and "BleachNick/SD3-Ult Edit_mask"?

Thanks for your help!

HaozheZhao commented 2 months ago

Hi

Thank you for your kind words about the project!

Here's a breakdown of the datasets and differences between them:

  1. Complete Dataset: This includes 4 million freeform image editing entries generated by our pipeline. It is part of the broader UltraEdit initiative.

  2. UltraEdit_Region-Based_100k: This subset supports region-based image editing and includes a mask image for each editing pair. It's designed for tasks where specific regions of an image are targeted for editing.

  3. UltraEdit_500k: This is a sampled subset of the complete dataset, containing 500k entries of freeform image editing data. We created this subset to maintain a comparable size with other similar datasets and to facilitate evaluation and ease of use.

Regarding your questions about the specific models:

Please feel free to reach out if you have any further questions!