LambdaLabsML / examples

Deep Learning Examples
MIT License
805 stars 103 forks source link

What is a "correctly formatted local directory"? #16

Closed Ozzah closed 2 years ago

Ozzah commented 2 years ago

In the Stable Diffusion Finetuning README, it is mentioned that the dataset from lambdalabs/pokemon-blip-cpations is on Huggingface Hub, but "could also be a correctly formatted local directory."

What is the correct format?

I've replaced data.params.train.params.name in the config from lambdalabs/pokemon-blip-cpations to /home/ozzah/finetuning_dataset which contains images and captions but there is an error.

Ozzah commented 2 years ago

Ok so after a few hours spent going through the source code with no luck, and eventually just blindly trying random things, it looks like something like this works:

  1. In the config yaml, set the name to some path like /home/me/somepath
  2. Create some JSON file like /home/me/somepath/dataset.json
  3. The JSON file should be structured as below:
[
  { "image": "/home/me/images/01.jpeg", "text": "a photo of mountains with clouds in the sky" },
  { "image": "/home/me/images/02.jpeg", "text": "a photo of mountains on a clear day" },
  { "image": "/home/me/images/03.jpeg", "text": "a photo of mountains with snow on top" }
]
justinpinkney commented 2 years ago

Yeah, sorry not much description here. That was meant to refer to the use of huggingface dataset's load_dataset function. That needs a folder and a jsonl file of metadata.

There are a couple of other things in my repo which I'm actively changing and haven't documented so beware!

willjejones commented 1 year ago

Hello!

I've followed these exact instructions (from @Ozzah ) but I still get errors.

Using a json file with an array of image/text objects e.g.

[
  {
    "image": "/home/ubuntu/images/man_1.png",
    "text": "a man walking with a cell in his hand, standing"
  },
  {
    "image": "/home/ubuntu/images/man_2.png",
    "text": "a man in a blue shirt and khaki pants, standing"
  }
]

produces a json library error:

File "/home/ubuntu/.local/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 135, in _generate_tables
    f"Not able to read records in the JSON file at {file}. "
AttributeError: 'list' object has no attribute 'keys'

But if my dataset.json file is just one object with an image key and a text key e.g.

{
    "image": "/home/ubuntu/images/man_1.png",
    "text": "a man walking with a cell in his hand, standing"
  }

I get this error instead:

File "/usr/lib/python3/dist-packages/torchvision/transforms/functional_pil.py", line 249, in resize
    raise TypeError(f"img should be PIL Image. Got {type(img)}")
TypeError: img should be PIL Image. Got <class 'str'>

Could anyone possibly help with where I've gone wrong?

Thank you!

adur1990 commented 1 year ago

So I figured it out. First, you need a folder where all your images are. They can be encoded as you wish, I am using grayscale PNGs.

Second, you need a JSON or JSONL caption file in the following format:

[
{"file_name": "/devel/data/images/image1.png", "text": "caption of image 1"}
{"file_name": "/devel/data/images/image2.png", "text": "caption of image 2"}
// ...
]

For JSONL, omit the [] parentheses. This caption file will match the image to the corresponding caption.

And then, in your latent diffusion yaml config, you have to provide the root_dir option that is a path to the folder with your images and the caption_file option that is the path to the JSON(L) caption file.

Here is how my yaml looks like:

//...
    train:
      target: ldm.data.simple.FolderData
      params:
        root_dir: /devel/data/images/
        caption_file: "/devel/data/captions.jsonl"
//...
dvschultz commented 1 year ago

Following up on @adur1990’s post. You should absolutely use JSONL if you are formatting your data like they mention. For some reason the basic JSON loader does not look for the exact same patterns (see: https://github.com/justinpinkney/stable-diffusion/blob/main/ldm/data/simple.py#L58). Lost some time to this one so hope others don’t if they come across this.