HaozheZhao / UltraEdit

180 stars 9 forks source link

How to load the. jsonl file when use run_sft_512_sdxl_stage1.sh ? #6

Closed luzhaoyan closed 3 months ago

luzhaoyan commented 4 months ago

Before training with stable-diffusion-xl, should I change the “train_data_jsonl” in the file scripts/run_sft_512_sdxl_stage1.sh? image When i load the UltraEdit dataset with the load_dataset from the datasets, has the. jsonl file been loaded? If not, how can I configure a. jsonl file for it?

HaozheZhao commented 4 months ago

To train the model with the UltraEdit dataset, set dataset_name=BleachNick/UltraEdit and pass it as an argument to the Python code. There's no need to set train_data_jsonl as an argument.

To train the model with your own dataset, provide the path to the JSONL file in the train_data_jsonl argument. Each item in the JSONL should have the following keys:

{
  "source_image": path to the source image,
  "edited_image": path to the edited ground truth image,
  "edit_prompt": the edit instruction,
  "mask_image": path to the mask image used in stage 2 training. For free-form image editing, set "mask_image" to "NONE" and a blank mask will be generated by default.
}