carson-katri / dream-textures

Stable Diffusion built-in to Blender
GNU General Public License v3.0
7.82k stars 427 forks source link

Improved Image Handling #719

Closed NullSenseStudio closed 7 months ago

NullSenseStudio commented 1 year ago

Summary

Image handling on the backend is currently quite a mess: images have to be flipped before returning, converted between PIL and numpy often (despite diffusers being able to input and output ndarrays), and depth maps need flipped before use and are received in float32 rather than uint8 unlike any other image. This PR aims to simplify all of this with the new image_utils module.

Details

Images received by the backend will be in float32 RGBA format and won't require any flipping upon receiving or outputting an image (unless if there is some library that would require it flipped like Blender does). Depth maps for use in depth to image (not depth control net) will be in float32 grayscale without a channel dimension. This allows all images to be close enough to what diffusers can handle as images with minimal preprocessing on dream textures' side. Usually this just involves removing the alpha channel, extracting alpha as an inpaint mask, or resizing to certain dimensions. For custom backends that may require PIL images there's an extra image_utils.np_to_pil() function that'll handle conversion without all the code clutter. The diffusers backend is now primarily using image_utils.image_to_np() for most of its needs. It acts as an all-in-one function that supports inputting various image types or file paths and calls upon other image_utils functions determined by its kwargs.

Returned images won't have to follow as rigid of a requirement. DType can be any floating point or integer type, as long as it's using its proper type range (int(0) = float(0), int.max = float(1)). Channels don't matter: can be grayscale, RGB, with or without alpha.

The frontend's code is simplified with image_utils.bpy_to_np() and image_utils.np_to_bpy(). Both functions will flip the image and include handling color spaces. image_utils.np_to_bpy(..., float_buffer=True) while currently unused, would allow for saving higher color precision and support potential future HDRI models.

Drawbacks

Using numpy directly instead of converting to PIL can cause issues without very obvious causes.

I've noticed that having values barely below 0 or above 1 has a very bad effect on image to image saturation. Certain resizing methods and color transforms can cause values to shift slightly outside of this range, which is not a problem for PIL due to limited precision. Also not removing the alpha channel before giving the image to diffusers will lead to it being used directly as latents instead of going through encoding first. Normally causes an out of memory error, though I'm sure if someone had enough memory it would lead to strange results.

NullSenseStudio commented 1 year ago

Here's an example of some of the clutter this is dealing with from depth_to_image.py

# before
depth_image = PIL.ImageOps.flip(PIL.Image.fromarray(np.uint8(depth * 255)).convert('L')).resize(rounded_size) if depth is not None else None
init_image = None if image is None else (PIL.Image.open(image) if isinstance(image, str) else PIL.Image.fromarray(image.astype(np.uint8))).convert('RGB').resize(rounded_size)
# after
depth = image_to_np(depth, mode="L", size=rounded_size)
image = image_to_np(image, mode="RGB", size=rounded_size)

So much easier to read through.

NullSenseStudio commented 1 year ago

I was hoping that keeping depth images in 32-bit would help with some finer details for depth to image or depth control nets, but it seems that any I've tried unfortunately don't.

I did at least find that diffusers/controlnet-depth-sdxl-1.0 can run into a sort of artifact with 8-bit depth images, but is just fine in 32-bit. Left: 8-bit, Right: 32-bit image There are these lines along beam that are persistent no matter the seed or prompt with 8-bit depth. This artifact can affect much more of the image if you're unlucky enough. image

NullSenseStudio commented 11 months ago

@carson-katri I'd like to standardize how images are shared between the render engine nodes. I propose keeping the images flipped like Blender naturally has it and switching the color space to linear so that image operations would match how they occur in other node editors. Also do you think the amount of channels should be standardized to 4 or allow anything between 1-4?

carson-katri commented 10 months ago

@NullSenseStudio Sorry for the delayed response. Other node editors use Color sockets for 4 channel images, and Float sockets for 1 channel, and I don't think there are any other options for 2/3 channels. So I'd say our nodes should always have 4 channels for images, and 1 channel for other 2d arrays (like depth operations).

NullSenseStudio commented 7 months ago

With the render engine now outputting images in linear color space you won't have to change the color management display device to none for accurate viewing, which has since been removed in Blender 4.0.

Resize and image file nodes can be used in earlier versions, but not with as good of resize sampling or file compatibility.

Dynamic sockets are fixed for Blender 4.0. Getting sockets by their string name is no longer supported when they are disabled: https://projects.blender.org/blender/blender/commit/e4ad58114b9d56fe838396a97fe09aff32c79c6a