Closed Ozzah closed 2 years ago
Ok so after a few hours spent going through the source code with no luck, and eventually just blindly trying random things, it looks like something like this works:
name
to some path like /home/me/somepath
/home/me/somepath/dataset.json
[
{ "image": "/home/me/images/01.jpeg", "text": "a photo of mountains with clouds in the sky" },
{ "image": "/home/me/images/02.jpeg", "text": "a photo of mountains on a clear day" },
{ "image": "/home/me/images/03.jpeg", "text": "a photo of mountains with snow on top" }
]
Yeah, sorry not much description here. That was meant to refer to the use of huggingface dataset's load_dataset
function. That needs a folder and a jsonl file of metadata.
There are a couple of other things in my repo which I'm actively changing and haven't documented so beware!
Hello!
I've followed these exact instructions (from @Ozzah ) but I still get errors.
Using a json file with an array of image/text objects e.g.
[
{
"image": "/home/ubuntu/images/man_1.png",
"text": "a man walking with a cell in his hand, standing"
},
{
"image": "/home/ubuntu/images/man_2.png",
"text": "a man in a blue shirt and khaki pants, standing"
}
]
produces a json library error:
File "/home/ubuntu/.local/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 135, in _generate_tables
f"Not able to read records in the JSON file at {file}. "
AttributeError: 'list' object has no attribute 'keys'
But if my dataset.json file is just one object with an image key and a text key e.g.
{
"image": "/home/ubuntu/images/man_1.png",
"text": "a man walking with a cell in his hand, standing"
}
I get this error instead:
File "/usr/lib/python3/dist-packages/torchvision/transforms/functional_pil.py", line 249, in resize
raise TypeError(f"img should be PIL Image. Got {type(img)}")
TypeError: img should be PIL Image. Got <class 'str'>
Could anyone possibly help with where I've gone wrong?
Thank you!
So I figured it out. First, you need a folder where all your images are. They can be encoded as you wish, I am using grayscale PNGs.
Second, you need a JSON or JSONL caption file in the following format:
[
{"file_name": "/devel/data/images/image1.png", "text": "caption of image 1"}
{"file_name": "/devel/data/images/image2.png", "text": "caption of image 2"}
// ...
]
For JSONL, omit the []
parentheses.
This caption file will match the image to the corresponding caption.
And then, in your latent diffusion yaml config, you have to provide the root_dir
option that is a path to the folder with your images and the caption_file
option that is the path to the JSON(L) caption file.
Here is how my yaml looks like:
//...
train:
target: ldm.data.simple.FolderData
params:
root_dir: /devel/data/images/
caption_file: "/devel/data/captions.jsonl"
//...
Following up on @adur1990’s post. You should absolutely use JSONL if you are formatting your data like they mention. For some reason the basic JSON loader does not look for the exact same patterns (see: https://github.com/justinpinkney/stable-diffusion/blob/main/ldm/data/simple.py#L58). Lost some time to this one so hope others don’t if they come across this.
In the Stable Diffusion Finetuning README, it is mentioned that the dataset from
lambdalabs/pokemon-blip-cpations
is on Huggingface Hub, but "could also be a correctly formatted local directory."What is the correct format?
I've replaced
data.params.train.params.name
in the config fromlambdalabs/pokemon-blip-cpations
to/home/ozzah/finetuning_dataset
which contains images and captions but there is an error.