Closed Zeus1037 closed 2 months ago
The caption_data.jsonl
file contains the source image path, an edit instruction, the original caption, and the expected caption of the edited image. For region-based editing samples, an additional column "edit_object"
specifies the object to be edited.
Here is an example of the JSONL file format:
Standard Editing Example:
{
"edit_instruction": "Change the setting to a snowy winter scene",
"input_text": "A guy and girl sitting outside, eating hotdogs.",
"input_image": "ShareGPT4V/coco/train2017/000000314048.jpg",
"output_text": "A guy and girl sitting outside, eating hotdogs in a snowy winter scene.",
"global_idx": 2000078
}
Region-Based Editing Example:
{
"edit_instruction": "Replace the train with a magical flying carpet",
"input_text": "Two people are about to board an Amtrak.",
"input_image": "coco/train2014/COCO_train2014_000000178793.jpg",
"output_text": "Two people are about to board a magical flying carpet.",
"global_idx": 1918137,
"edit_object": "train"
}
Due to potential issues with data storage, upload time, and licensing, we are not providing the public available image-caption dataset right now. However, you can create your own dataset following Follow the structure of the example JSONL files provided above. You can simply gather your data, structure it according to the provided format, and run your scripts for data generation.
Hi, Thanks for your excellent work. I am trying to use this pipeline for data generation. And I want to know whether the jsonl file mentioned in run_sdxl_turbo_p2p_i2i_8gpu.sh will be released later? Thx.