comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
51.57k stars 5.42k forks source link

[ Feature Request ] - Higher bit depth file export #572

Open rethink-studios opened 1 year ago

rethink-studios commented 1 year ago

Hey hey, Is it possible to add export of PNG16, DNG or EXR? Memory intensive, but would be huge!

comfyanonymous commented 1 year ago

Yes, the image data being passed around is RGB values is in float32. The only issue is that I'm not sure if PIL supports saving in higher precision PNG and I don't want to pull in another dependency just for this.

Should be easy to make a custom node though.

WASasquatch commented 1 year ago

The diffusion process may use float32, but all the source data is 8bit images, so you get no benefit from using fp32 (the training process never maps any 16bit/32bit color ranges) as far as color depth. Just wasted time. The results of the images are within 8bit range minus some slight discrepancy. There was some studies on diffusers that showed fp32 did nothing as far as a difference with fp16, and was just a waste of time and resources. Even then, with only 8bit source data, even fp16 is a waste.

More to the point, here is some code to get you going on a node if ya want.

import tifffile as tiff #pip install tifffile

# Full float tiff
image_uint32 = (image * (2**32-1)).astype('uint32')
tiff.imwrite('image.tif', image_uint32, dtype=tiff.uint32)

# Half-float tiff
image_uint16 = (image * 65535).astype('uint16')
tiff.imwrite('image.tif', image_uint16, dtype=tiff.uint16)

FYI this is the lightest way to obtain a half-float or full float image. Least that I found, other libraries are super massive for just this purpose like @comfyanonymous stated.


Went ahead and added it to WAS Node Suite because I wanted to test the difference between files myself, though I don't expect any real difference.

image

WASasquatch commented 1 year ago

PS little more streamlined code usage. It'll take care of the ranging itself based on the file format type.

tiff.imwrite("half-float.tiff", image.cpu().numpy(), dtype=tiff.float16)
tiff.imwrite("full-float.tiff", image.cpu().numpy(), dtype=tiff.float32)

Further digging has found that Pillow itself has this support @comfyanonymous if ya did want to add it:

from PIL import Image
import numpy as np

# Construct 16-bit gradient greyscale image
im = np.arange(65536,dtype=np.uint16).reshape(256,256)

# Save as TIFF with PIL/Pillow
Image.fromarray(im).save('result.tif')

According to the demonstration image here: https://itecnote.com/tecnote/python-how-to-convert-dtypeuint16-data-into-a-16bit-png-image/ it is in fact 16bit.

Edit

So, both these methods suck. tifffile requires older numpy which interferes with other stuff it seems.

And the Pillow method doesn't seem to actually work in current version. I've treid a few ways to scale the data, and they all work, just don't save the correct data.

For example here these functions:

    def saveu16(self, u16in, size, tiff_filename):
        if not isinstance(u16in, torch.Tensor):
            raise TypeError("Input must be a PyTorch tensor")
        u16in_np = (u16in.cpu().numpy() * (2**16-1)).astype(np.uint16)
        img_out = Image.new('I;16', size)
        outpil = u16in_np.astype(u16in_np.dtype.newbyteorder("<")).tobytes()
        img_out.frombytes(outpil)
        img_out.save(tiff_filename)

    def saveu32(self, u32in, size, tiff_filename):
        if not isinstance(u32in, torch.Tensor):
            raise TypeError("Input must be a PyTorch tensor")
        u32in_np = (u32in.cpu().numpy() * (2**32-1)).astype(np.uint32)
        img_out = Image.new('I;32', size)
        outpil = u32in_np.astype(u32in_np.dtype.newbyteorder("<")).tobytes()
        img_out.frombytes(outpil)
        img_out.save(tiff_filename)

Which just produces garbage images like this:

ComfyUI_00008_

rethink-studios commented 1 year ago

Last year, while playing with Deforum and wanting AUTO to have access to EXR export, I connected with a developer (Pabla) and they were able to put together a solution, who like @WASasquatch, had issue with how the file saved, and I'm wondering if this helps:

https://stackoverflow.com/questions/42406338/why-cv2-imwrite-changes-the-color-of-pics

Here's Pabla's git, and I know we're not to copy/paste code, however my hope is to shine a light on the subject and help find solutions for us CGI/Print peeps.

https://github.com/pablx-ts/deforum-stable-diffusion

@WASasquatch, for my own education, even is the input data was 8bit, once we're in latent space, and we apply a blur or latent scale to the data, wouldn't the information, at that point, especially if the range of the data has been expanded thru the blur, wouldn't we see SOME benefit?

Back in the day, I would take multiple bracketed exposures to create an HDR, and I'm wondering if we're getting to the point where we can synthesize f-stops in app. Thoughts?

WASasquatch commented 1 year ago

Last year, while playing with Deforum and wanting AUTO to have access to EXR export, I connected with a developer (Pabla) and they were able to put together a solution, who like @WASasquatch, had issue with how the file saved, and I'm wondering if this helps:

https://stackoverflow.com/questions/42406338/why-cv2-imwrite-changes-the-color-of-pics

Here's Pabla's git, and I know we're not to copy/paste code, however my hope is to shine a light on the subject and help find solutions for us CGI/Print peeps.

https://github.com/pablx-ts/deforum-stable-diffusion

@WASasquatch, for my own education, even is the input data was 8bit, once we're in latent space, and we apply a blur or latent scale to the data, wouldn't the information, at that point, especially if the range of the data has been expanded thru the blur, wouldn't we see SOME benefit?

Back in the day, I would take multiple bracketed exposures to create an HDR, and I'm wondering if we're getting to the point where we can synthesize f-stops in app. Thoughts?

A blur won't extend the range of the images actual color, but just blur the image which sure would have less banding, but wouldn't be truly 16-bit or 32-bit. If you just scaled 8bit data into 16-bit / 32-bit ranges you'll just end up with a black or white image, maybe some faint color burning, cause the ranges are all scaled up, so it's likely blacks and whites will be way out of range. What happens is the image data is more or less still in its original range of color, just in a 16-bit or 32-bit file format.

I also think that the issue is simply because it may be a float32 tensor, but the image data is still in relative range of a 8bit image, and these saving processes are trying to range to 16bit/32bit, and thus, destroys the image data.

I think in order to save the images you have to still clip them to 8bit range, and save as 16bit/32bit, which means you're definitely getting no benefit and just wasting space and resources.

rethink-studios commented 1 year ago

@WASasquatch, I'm wondering if color can be remapped, based on a specified envelope or LUT, while we're in float, that pushes color depth into the high dynamic range space (with a remap node?), before saving the file?

WASasquatch commented 1 year ago

@WASasquatch, I'm wondering if color can be remapped, based on a specified envelope or LUT, while we're in float, that pushes color depth into the high dynamic range space (with a remap node?), before saving the file?

That sounds like probably the right direction. I'll be honest though, I'm no good with tensor/numpy manipulation though.

comfyanonymous commented 1 year ago

The problem with generating high dynamic range images with stable diffusion is that the VAE normalizes the image when it converts it to pixel space.

WASasquatch commented 1 year ago

The problem with generating high dynamic range images with stable diffusion is that the VAE normalizes the image when it converts it to pixel space.

Not to mention it's variance with the normal color space as is. Gradients getting blotchy, color loss or strange gain, etc.

aulerius commented 9 months ago

Hey everyone! Just wandering through google search results and came to visit this.

It seems there hasn't been any proper examples of high precision methods with SD? I've been wondering, couldn't there be an option to fine tune a model, or better, the VAE itself to work with increased precision, and ESPECIALLY HDR (floating point) content?

A bit like it's been done here (although it's still 8bit)

The applications and use cases I'm building would benefit immensely from more widespread high precision and even floating point image input/output. Not just for images, but also depth models, etc. I think it's about time.