AnonymousCervine / depth-image-io-for-SDWebui

An extension to allow managing custom depth inputs to Stable Diffusion depth2img models for the stable-diffusion-webui repo.
73 stars 8 forks source link

Batch img2img mode? #3

Open modcomper opened 1 year ago

modcomper commented 1 year ago

Hi there!

Want to start by echoing the praise already expressed from others, thanks so much for putting the time into getting this out there! It was a missing part of the puzzle and is really great to see.

As per the title, I was wondering if there is a way to use this in batch img2img mode by feeding a folder containing an img sequence of depth images as the custom depth input? Currently it only seems to accept a single image. Alternatively maybe it could pick up the custom depth channel from the input img2img directory, if that contains an rgbd sequence?

Appologies if this is already implemented somehow, may have missed it.

Thanks again!

AnonymousCervine commented 1 year ago

This feature is on my (very short) to-do list (I rely on functionality similar to this in my own workflow, in fact, but not in a way that would be clean to replicate here).

It's been made slightly awkward by the fact that the built-in img-2-img batch tab in webui doesn't seem to expose what it's doing to extensions (though I may be missing something. But it seems to just call the extension separately once per image in the input directory—desirable behavior on some fronts, but inconvenient here).

The bargain-bin version would probably just be to just replicate (some of? all of?) the functionality of the batch img2img pane within the extension controls, and that may just be what I end up doing.

Alternatively maybe it could pick up the custom depth channel from the input img2img directory, if that contains an rgbd sequence?

This would be trickier (Python Image Library doesn't explicitly support RGBD and I don't really want to play a format guessing-game from its limited worldview, so that would mean introducing something more sophisticated to handle images; additionally I'm somewhat constricted by not really knowing what people are using for combined color/depth formats these days (I think the last time I handled RGBD image-data, it was from the original Microsoft Kinect; it's been a little bit)).

modcomper commented 1 year ago

Thanks for the update, and great to hear it's on the list!

All makes sense, that is annoying re the way batch img2img currently works w extensions, explains why many of them don't support batching. Replicating the whole tab may indeed be the easiest way forward...

On a side note, have you/any one here had any luck finding a solid way of fine-tuning 512 depth models on custom datasets? (saw an implementation for Dreambooth-like training, but ideally am looking for something that works with large datasets)

AnonymousCervine commented 1 year ago

A prototype of batching functionality now exists. Let me know what you think!

On a side note, have you/any one here had any luck finding a solid way of fine-tuning 512 depth models on custom datasets?

Apologies, I haven't the faintest! I've seen a couple sets of instructions around for more traditional fine-tuning (i.e. not Dreambooth), but I haven't personally come up with a good excuse to try any of them yet.

modcomper commented 1 year ago

Great stuff - Thanks for putting this out there, had a chance to test it over the weekend and it works! Gradio naming convention is indeed interesting, but after some tweaking managed to get it to process the correct pairs and output the sequence in the right order Will test further this week and feedback anything noteworthy - but so far so good :)

Thanks again!

Ah and for anyone who might end up here looking for a way to finetune depth models on a larger scale than dreambooth, I ended up finding that StableTuner does a pretty decent job at it! Would recommend

AnonymousCervine commented 1 year ago

I'm glad!

I might yet be able to correct the butchery it makes of the names post-facto, and then do proper alphanumeric sorting. The tricky bit that I haven't done in order to do that is first confirming that it does the same butchery on all relevant platforms, at least one of which I don't personally have.