Teriks / dgenerate

dgenerate is a command line tool for generating images and animation sequences using stable diffusion and related techniques.
https://dgenerate.readthedocs.io
20 stars 0 forks source link

update dgenerate #1

Open rahmirtrt opened 3 months ago

rahmirtrt commented 3 months ago

Can you update dgenerate for new models?

Spandrel now supports many more models. https://github.com/chaiNNer-org/spandrel

degenerate is a very nice program. I use it constantly. Please continue your support.

Teriks commented 3 months ago

I will take a look, this should be fairly simple to do

I do recall that I needed to possibly modify my usage of spandrel to not use tiling for certain models

I have become quite busy

rahmirtrt commented 3 months ago

Thank you. I'm waiting for degenerate's new versions(for new models).

whenever the (https://github.com/chaiNNer-org/spandrel/tree/main/libs/spandrel/spandrel) to (C:\Program Files\dgeneratet_internal\spandrel) folder is changed new models work. Can you develop dgenerate this way? Thus, there is no need to constantly update dgenerate(for new models).

Teriks commented 3 months ago

Thank you. I'm waiting for degenerate's new versions(for new models).

whenever the (https://github.com/chaiNNer-org/spandrel/tree/main/libs/spandrel/spandrel) to (C:\Program Files\dgeneratet_internal\spandrel) folder is changed new models work. Can you develop dgenerate this way? Thus, there is no need to constantly update dgenerate(for new models).

Due to the way PyInstaller creates the environment under _internal, and the generated executables interactions with the environment when loading modules, the launcher will become unhappy about loading the module with version differences or renamed / now missing sources etc (on my machine). Unfortunately it cannot be made to work that way for the windows installer, at least with PyInstaller.

It could be made to work that way if spandrel features could be loaded as an accessory module, with the user installing it manually into an environment of their own creation. For instance in the simple case, if you install dgenerate into it's own python environment instead of using the Windows installer, and then forcefully upgrade the spandrel package to 0.3.1, that would probably work (from what I am observing now) in the short term, as long as no major API changes actually occurred in spandrel. In the long term it has to be manually updated (in every case) to ensure that the dependency is still written against correctly with version changes, spandrel seems to be in fairly active development so something breaking with the API could happen frequently.

I will bump spandrel to latest and release a new installer once I ensure it is still working.

On Windows, the entire environment must be crafted carefully and frozen for this software to "just work" without any issues, and it just barely works as is, so I would imagine any tampering of dependencies in the _internal environment created by PyInstaller will cause it to tip into the "don't work" state very quickly. The windows installer is a convenience that trades off entirely the users ability to control the python environment / dependencies in favor of something that will just work when it is installed.

Teriks commented 3 months ago

I have updated spandrel to v0.3.1 in the installer environment, and added spandrel_extra_arches v0.1.1 for additional model support.

Since none of the model architectures listed in https://openmodeldb.info search filters seemed to discourage the use of external tiling or tile internally I have left tiling as is, as they are the most likely to be used model architectures.

The patch release can be found here: https://github.com/Teriks/dgenerate/releases/tag/v3.1.1

If this is now working for you for the desired models, this can be closed.

rahmirtrt commented 3 months ago

Thank you. It's perfect. it works. I have tested. I will have some suggestions.

1- dgenerate 1.5x-2x slower than chainner. Can it be accelerated?

2- I am using this command.

dgenerate --sub-command image-process tmp2.jpg --output tmp2u.png -al 1 --processors "upscaler;model=models/4x_Real_SSDIR_DAT_GAN.pth"

Can a dgenerate version be made just for only upscaler models(.pth, .ckpt) ? In this way, it eliminates other unused features, speeds up and reduces its size.

3- only support (-d cuda:0) other -d cuda:1,2,3 error 'CUDA device ordinal 1 is invalid, no such device exists'

4- Could there be GPU selection support? -g gpu-id gpu device to use (default=auto) can be 0,1,2 for multi-gpu

5-Gray models do not work as in previous versions. for example this model. 004_grayDN_DFWB_s128w8_SwinIR-M_noise25.pth

6- It would be better if a portable version was made. .bat file can be made for settings.

Teriks commented 3 months ago
  1. I can think of a few reasons for it to be slower upfront, I will need to investigate if the tiling can be optimized any further, I am guessing chainner has a more efficient implementation of tiling. One of the other reasons could be that dgenerate has a very slow start up time due to imports of large dependencies such as torch and friends, numpy, large machine learning librarys etc, that support its other features. As you have identified, dgenerate does many other things and it has a lot of large machine learning dependencies, upon a fresh install it will take even longer to bytecode compile all of this on the first execution. chainner will not have such a problem with its server client architecture, which allows those libraries to be imported once when the server launches and then used repeatedly in the same process. This can be somewhat mitigated for batch operations by using config files, which are read in to dgenerate from STDIN and described in the readme and amongst the examples in the example folder, multiple operations can be preformed within the same process this way, without having to spin up a new process and incur the overhead of importing dependencies over again. I had it in mind at some point to allow dgenerate to be used as a server which takes requests as a background process, this theoretically could already be done now by talking to it over STDIN with a pipe, but I have not written any other software to facilitate talking to it this way.

  2. The upscaler portion of dgenerate was implemented primarily to allow the upscaling of AI generated content produced using the various diffusion algorithms that dgenerate supports, as a post process capability on top of its primary feature set (creating AI generated images). I thought the image processing functionality internal to dgenerate would be useful as a stand alone feature, enabling upscaling and other processing of arbitrary images, so I added a way to use it in this manner on any image, I am glad this has been useful! The image-process sub-command and \image_process config directive uses a lot of reusable functionality I wrote for dealing with multimedia content in a generic fashion and handling alignment and resizing (images/videos) etc. mediainput.py & mediaoutput.py which is in turn very intertwined and dependent with a couple of other modules within dgenerate, it could be extracted into a standalone program with quite a bit of effort, however it will still depend on some very large dependencies such as torch and numpy etc, so I am not sure if it would be worth it as the dependencies are mostly what contribute to all of the bloat. upscaler.py is the image processor plugin which handles the upscaler functionality. It might be easier to steal this code from the upscaler plugin for standalone use than to refactor the image-process subcommand out of dgenerate, however the plugin itself does not handle any of the alignment or naive resizing functionality.

  3. If am understanding correctly, you are having trouble specifying a GPU during upscaling? this can be accomplished in one of two ways demonstrated in the code box below. currently I am working on a machine with two 1080TI GPUs installed and this works for me. You may need to inspect the value of your CUDA_VISIBLE_DEVICES environmental variable if this is not working on a machine which has multiple GPUs installed, see nvidias website for details. Only if you have multiple GPUs installed in your machine that are reported as independent devices by the nvidia-smi command, can you assign jobs to a different GPU, otherwise if the device is not visible for some reason, you will get an error stating that the device ordinal does not exist. Only one GPU can ever be specified, I.E. the job cannot be distributed across multiple GPUs, the job will happen all on the specified GPU. Note that cuda:0 indicates the first GPU installed in the system. A single upscaling task being distributed across multiple GPUs is not something spandrel implements, and would be difficult to implement, however I am not sure if this is what you are asking about. If your machine encounters an error with either of the following commands and you are sure that you have at least 2 independent GPUs installed in your system, could you post the dgenerate error output, the value of CUDA_VISIBLE_DEVICES in your environment, and the output of the nvidia-smi command?

# Run on the 2nd GPU on the system

dgenerate --sub-command image-process tmp2.jpg --output tmp2u.png -al 1 --processors "upscaler;model=4x_Real_SSDIR_DAT_GAN.pth" --device cuda:1

# Or use a plugin argument

dgenerate --sub-command image-process tmp2.jpg --output tmp2u.png -al 1 --processors "upscaler;device=cuda:1;model=4x_Real_SSDIR_DAT_GAN.pth"
  1. I see what is happening is that the model is being fed an RGB image by dgenerate when it expects an single channel grayscale image, I believe I noticed this problem with DDColor which was one of the model architectures I wanted to test that uses tiling internally. The code for upscaling is hard coded for 3 channel images currently and I will investigate generifying it based on the instantiated upscaler models desired number of color channels. It will require an in memory conversion to the correct number of channels IE grayscale when using those models. It may take me some time unfortunately even though it is a small fix, due to life :) but I will try to look at it as I am interested in having every model architecture working.

  2. I am unsure what you mean by portable, but if you mean a single executable I am not sure it is possible with the current state of PyInstaller as a project. It has a terrible time trying to handle the machine learning dependencies in the install folder configuration as is. But also the executable would contain all of torch and numpy, etc. and be around as enormous as the existing install folder. It is really hard to escape this bloat for a python ML project, the size of the dependencies required to make even just the upscaler work such as torch, numpy, possibly OpenCV are ridiculously large and incur a huge startup cost for the application. I wish it were not like this :)

Teriks commented 3 months ago

Per .5

Model architectures expecting single channel images (I.E. Grayscale), including DDColor image colorization models and SwinIR gray variants etc. should be fixed by this change in the upscaler plugin:

7f11dbb - upscaler.py

When a model has an input_channels value equal to 1, one of two things will occur:

  1. Incoming images which are detected to already be grayscale are simply reshaped to a single channel to conform to what the model wants as input. Image processor plugins always receive images as RGB so this reshaping / conversion is necessary.

  2. Incoming images which are detected to be full color images are converted from color to grayscale in the process of being reshaped.

In addition, a boolean upscaler plugin option force-tiling has been added. Model architectures such as SCUNet and MixDehazeNet are no longer tiled by default unless forced by the user, as it is discouraged by the model implementation. All other model architectures where tiling is fully supported undergo external image tiling.

Architectures which tile internally such as DDColor from spandrel extra architectures no longer undergo any external tiling and cannot be forced to undergo external tiling. DDColor is the only implementation with internal tiling currently.

Version bump to 3.2.0 for technically "new features", will release after further testing.

Teriks commented 3 months ago

Per .1

Attempt optimizations to tiled scaling c170807

rahmirtrt commented 3 months ago

Can you tell me your e-mail? Let's discuss some things via e-mail.

rahmirtrt commented 3 months ago

nygpu

3- 'CUDA_VISIBLE_DEVICES in your environment and nvidia-smi command' I don't know how to do it or the commands. The graphics card feature I use is in the picture.

For ncnn models I use this Universal-NCNN-Upscaler. https://github.com/TNTwise/Universal-NCNN-Upscaler https://github.com/TNTwise/SPAN-ncnn-vulkan

Can dgenerate run ncnn files (. param . bin), what is the command?

6- I didn't mean "if you mean a single executable". I meant the installer file (dgenerate.msi). Not a single exe. In the same way, there will be a dgenerate and _internal folder.

It can work without installing Python and others. You can make all the necessary settings in a .bat file (install.bat) for portable.

Teriks commented 3 months ago

It appears your system has a singular GTX 1660 installed (or recognized currently by the system) in addition to intel UHD integrated graphics

The only device that supports CUDA on your system is the one GTX 1660 (device ordinal 0) which is why the other GPU is not visible to torch

Only devices listed by nvidia-smi will be compatible with acceleration with dgenerate

spandrel does not support models in NCNN format, spandrel is centered around running models with torch on Nvidia hardware / CUDA where as NCNN is made from the ground up to run on multiple types of hardware particularly mobile platforms and embedded devices

It seems there is a way convert directly from models compatible with spandrel into NCNN for optimized use on other devices. I am not sure if there is a tool to convert NCNN models back to torch models however. It might be likely that the NCNN models you are using have a Torch format counterpart from which they were converted from, which will in turn work with dgenerate

The NCNN models might be able to have acceleration through Intel UHD graphics however this is currently not something dgenerate can do

chainner seems to support them in some other way besides spandrel, I am unsure how but I can take a look

A zip file of the PyInstaller environment itself with the generated executable would suffice for the described portable installation, it could be unzipped and run that way without the need for installer. Is this what you mean? or do you mean having all the CAB files in the MSI? If it is the later, a singular MSI with the CAB files inside would be large and since the MSI file itself remains in the installer cache on Windows it will waste quite a bit of space on disk. I believe I tried this approach with a singular MSI in the beginning and WiX refused to create it correctly, I am guessing because it is too large for it.

Teriks commented 3 months ago

As for emailing, I wish for issue discussions to remain on the repository as opposed to email, so that the discussions can be referenced by other users who might benefit from the information

rahmirtrt commented 3 months ago

A zip file of the PyInstaller environment itself with the generated executable would suffice for the described portable installation, it could be unzipped and run that way without the need for installer. Is this what you mean?

yes that's what I meant. There will only be dgenerate.exe, setting.bat and _internal folder in the zip. It would be better if there is no need for the setting.bat file. like click to run (dgenerate.exe)

Teriks commented 2 months ago

I can create a two part zip file in the installer build prior to packaging that can be distributed this way, unzip and run from the directory.

You will need to open a command prompt in the directory of the dgenerate executable in order to run it, unless it is added to your system PATH. Or I suppose refer to it by its full path in a BAT file.

Currently all the installer does is install it and add its location to your system PATH variable for convenience so that it can be run from the command line anywhere without additional steps

Teriks commented 2 months ago

New release should fix grayscale model issues, I have included zips of the raw executable/environment

rahmirtrt commented 2 months ago

I have tested.

1- I did not see any acceleration in v3.2.0 compared to other versions.

dgenerate --sub-command image-process tmp2.jpg --output tmp2u.png -al 1 --processors "upscaler;model=models/4x_Real_SSDIR_DAT_GAN.pth" Also, after this command, it waits for a while and then starts the process.

Could there be multithreading support for speed? Like 2,4,6,8 processor

2- If it is not run as administrator, as in other versions, it gives this error. image-process: error: [Errno 13] Permission denied: '.\.lock'

3- It gives this error in the picture. for 'dn_grl_base_c3.ckpt' terxa

Some models do not work in dgenerate dn_grl_base_c3.ckpt, jpeg_grl_small_c3.ckpt. others dn_grl_base_c3s50.ckpt... works. all models work in chainner.

you can download it from here https://github.com/ofsoundof/GRL-Image-Restoration/releases/

rahmirtrt commented 2 months ago

I compared the png outputs of dgenerate and chainner. %100 not the same. Even though the tile sizes are the same. Is it due to overlap? I tried different overlap values but could not equalize them.

Teriks commented 2 months ago

It seems that model expects 4 channel input, IE something other than RGB. I will need to dig through the source of spandrel or find some documentation on what sort of color data it wants, I had really only considered the simple case of RGB images since that is mostly what I am working with, and there is now a special case for Grayscale models but not currently for whatever image format the non working model desires

multithreaded processing for tiles is potentially possible but I am not sure if/when I can get around to testing out something like that, since I generally run the upscaler in batch unattended I hadn’t put an emphasis on speeding it up. From the memory overhead on the GPU on my machine it could potentially process 3 tiles simultaneously. I am unsure what this would entail because I haven’t investigated how thread safe running the models are, I am guessing they would need their own instance for each thread and to be completely isolated

I have not compared the output of chainners upscaling with dgenerate so far so I will see if I can do that, if you could provide an example of the two outputs here that would be great

Teriks commented 2 months ago

\.lock in the output directory of dgenerate is used as a method of preventing multiple dgenerate processes from interfering with eachothers output if running simultaneously on different GPUs as separate processes

If this file exists or is stuck existing as owned by administrator you can safely delete it manually when dgenerate is not running

I would not run dgenerate as administrator, or with an output location where administrator privileges are required to write

Teriks commented 2 months ago

models such as dn_grl_base_c1.ckpt from https://github.com/ofsoundof/GRL-Image-Restoration/releases/ seem to expect 2 channels even, off the top of my head I am not familiar with what sort of data it could be wanting as input :)

I will have to research

Teriks commented 2 months ago

Chainner simply treats the 4th channel for dn_grl_base_c3.ckpt as the alpha channel of the image, as all 1s (complete opacity in the alpha channel), from what I am seeing by testing all 0.0s VS all 1.0s output in dgenerate and comparing the results, this is possibly incorrect use of the model by chainner. The result is very blurry at 1.0 opacity and not blurry at 0.0, this may be some additional image data that is meant to be consumed by the model which is not an alpha/opacity value that is not accounted for by chainner? it could be a depth value or some map that is supposed to be calculated from the input image. It is a bit beyond the scope of dgenerate to provide this data, so I may divert to what chainner is doing and just fill it with 1.0s

Since you are using this model, do you have information about it? it is hard to come by, I will probably have to read a paper or something :)

rahmirtrt commented 2 months ago

I tried these in dgenerate for chainner tile size 512 4x_Real_SSDIR_DAT_GAN.pth;tile=512;overlap=8 4x_Real_SSDIR_DAT_GAN.pth;tile=512;overlap=16 4x_Real_SSDIR_DAT_GAN.pth;tile=512;overlap=32 These did not match exactly. 4x_Real_SSDIR_DAT_GAN.pth;tile=512;overlap=64 99% match.

I tried these in dgenerate for chainner tile size 256 4x_Real_SSDIR_DAT_GAN.pth;tile=256;overlap=1 4x_Real_SSDIR_DAT_GAN.pth;tile=256;overlap=2 4x_Real_SSDIR_DAT_GAN.pth;tile=256;overlap=8 4x_Real_SSDIR_DAT_GAN.pth;tile=256;overlap=12 4x_Real_SSDIR_DAT_GAN.pth;tile=256;overlap=16 4x_Real_SSDIR_DAT_GAN.pth;tile=256;overlap=32 4x_Real_SSDIR_DAT_GAN.pth;tile=256;overlap=64 I couldn't match it in any way. <%95

Teriks commented 2 months ago

For 2 channel models such as dn_grl_base_c1.ckpt I receive this error in chainner, I am guessing it does not attempt to down convert. I am guessing if I provide the correct image format to begin with it would work, but I am not sure what that format is.

2channel

I will try to investigate the output differences later tonight, it is difficult to tell with the naked eye what might be going on, it could be a combination of tiling algorithm differences, differences in image loading & saving

Teriks commented 2 months ago

Currently I do not have the bandwidth to refactor part of this project to isolate out just this feature into one tool and then maintain it, however I am not opposed to the code being forked or borrowed in order to do this!

Though I am interested in supporting more models if I can find the information on how they are meant to be used

After testing a smaller image 256x256, in which no tiling occurred, and testing the same image in chainner under the same condition (no tiling)

The image outputs still differed in file size and similarity between dgenerate and chainner.

This sort of leads me to believe that at least part of the difference is caused by something occurring during the loading of the image into memory and or its conversion into a Tensor before processing, that is minutely different than what is going on in the chainner code base.

These differences would compound during tiling (especially at lower tile sizes) and processing of the image as well as during encoding while saving them to disk, especially for JPG and lossy formats. The method of blending tiles preformed by dgenerate is also different than chainner from looking through their backend code.

I am not sure it is worth pursuing completely identical outputs unless there is serious artifacting since the outputs are very comparable visually

rahmirtrt commented 2 months ago

https://github.com/sekiju/sr-core you can check this out. It gives the same equal output as chainner. but it works very very slowly.

You can also check this out. https://github.com/muslll/neosr

rahmirtrt commented 2 months ago

Updated https://github.com/chaiNNer-org/spandrel/tree/main/libs/spandrel for Add support for SAFMN_BCIE. You can add it to dgenerate.

Teriks commented 2 months ago

Must wait for their release and the package being published to PyPi

rahmirtrt commented 2 months ago

I follow you, you are making big changes.

rahmirtrt commented 2 months ago

If tile=auto, what will be the overlap value?

Teriks commented 2 months ago

It will be the default value mentioned by dgenerate --image-processor-help upscaler which is 32. I neglected to check the condition of overlap potentially being more than the auto estimated optimal tile size but that should be unlikely to happen at least, with the amount of memory on most modern cards :)