Upscale performance feels really slow?

some9000 commented 1 week ago

Hello!

This app is amazing, but I just want to make sure something isn't being done incorrectly on my part.

Basically using it with the upscale functions (but mostly with x1 models to improve image quality) seems to take quite a while, several minutes most of the time. Meanwhile something like upscayl appears to blast through similar tasks within seconds. Obviously models can differ, but still such a massive difference makes one think.

Here's system information from the app:

{
  "app": {
    "version": "0.24.1",
    "packaged": true,
    "path": "C:\\Users\\XXX\\AppData\\Local\\chaiNNer\\app-0.24.1\\resources\\app"
  },

  "os": {
    "version": "Windows 10 Pro",
    "release": "10.0.22631",
    "arch": "x64",
    "endianness": "LE"
  },
  "cpu": {
    "manufacturer": "AMD",
    "brand": "Ryzen Threadripper 2950X 16-Core Processor",
    "vendor": "AuthenticAMD",
    "family": "23",
    "model": "8",
    "stepping": "2",
    "revision": "2050",
    "voltage": "",
    "speed": 3.5,
    "speedMin": 3.5,
    "speedMax": 3.5,
    "governor": "",
    "cores": 32,
    "physicalCores": 16,
    "performanceCores": 32,
    "efficiencyCores": 0,
    "processors": 1,
    "socket": "SP3r2",
    "flags": "de pse tsc msr sep mtrr mca cmov psn clfsh ds mmx fxsr sse sse2 ss htt tm ia64 pbe",
    "virtualization": false,
    "cache": {
      "l1d": 768,
      "l1i": 768,
      "l2": 8388608,
      "l3": 33554432
    }
  },
  "gpus": [
    {
      "vendor": "NVIDIA",
      "model": "NVIDIA GeForce RTX 2070",
      "bus": "PCI",
      "vram": 8192,
      "vramDynamic": false,
      "subDeviceId": "0x37AD1458",
      "driverVersion": "561.09",
      "name": "NVIDIA GeForce RTX 2070",
      "pciBus": "00000000:41:00.0",
      "fanSpeed": 81,
      "memoryTotal": 8192,
      "memoryUsed": 5146,
      "memoryFree": 2861,
      "utilizationGpu": 100,
      "utilizationMemory": 23,
      "temperatureGpu": 80,
      "powerDraw": 192.07,
      "powerLimit": 215,
      "clockCore": 1860,
      "clockMemory": 6801
    }
  ],
  "settings": {
    "useSystemPython": false,
    "systemPythonLocation": "",
    "theme": "default-dark",
    "checkForUpdatesOnStartup": true,
    "startupTemplate": "",
    "animateChain": true,
    "snapToGrid": false,
    "snapToGridAmount": 16,
    "viewportExportPadding": 20,
    "showMinimap": false,
    "experimentalFeatures": false,
    "hardwareAcceleration": false,
    "allowMultipleInstances": false,
    "lastWindowSize": {
      "maximized": true,
      "width": 1278,
      "height": 680
    },
    "favoriteNodes": [],
    "packageSettings": {
      "chaiNNer_pytorch": {
        "gpu_index": "0",
        "use_cpu": false,
        "use_fp16": true,
        "budget_limit": 0,
        "force_cache_wipe": false
      },
      "chaiNNer_ncnn": {
        "gpu_index": "0",
        "budget_limit": 0
      },
      "chaiNNer_onnx": {
        "gpu_index": "0",
        "execution_provider": "CUDAExecutionProvider",
        "onnx_tensorrt_cache": "",
        "tensorrt_fp16_mode": true
      }
    },
    "storage": {
      "lastDirectories": {
      },
      "nodeSelectorCollapsed": true,
      "recent": [
      ]
    }
  }
}

joeyballentine commented 1 week ago

try using a small-ish custom tile size when upscaling and see if that helps. the auto mode might be estimating a tile size that is slightly too large

some9000 commented 1 week ago

Thanks for the reply. I tested different Tile Sizes and seems like Auto is estimating just fine, it did not jump into "not enough VRAM" territory. Here are the values just in case:

Size / Time 256 / 79s 384 / 72s 512 / 69s 768 / 67s <== Seems like Auto estimate took this, same value 1024 / 66s 2048 / 2m 13s <== Ran out of VRAM

Guess everything works the way it is supposed to, it's just the user being impatient, heh. Well, just wanted to make sure everything is being done correctly on this end and looks like it is.

joeyballentine commented 1 week ago

You're doing this via the pytorch nodes, right?

some9000 commented 1 week ago

Yes, here is an example. A, theoretically, simple upscale took 4m 52s

(These screen capture options are amazing, btw)

joeyballentine commented 1 week ago

That's a really large model you're running on a very large image. It's going to be slow no matter what

some9000 commented 6 days ago

That's a really large model you're running on a very large image. It's going to be slow no matter what

I see. Guess upscayl is just doing something differently (sneakily using smaller models or such) and has spoiled my expectations. Thank you for your patience.

Btw, would it be hard to add an option for something like a little beep (or just regular OS notification noise, whatever it may be) when a chain finishes?

pokepress commented 5 days ago

Since you seem to be on Windows, you should be able to use task manager' performance tab to verify your GPU is being used as expected. It probably is, but it can't hurt to check. You should see the usage show up on the Cuda graph

JeremyRand commented 1 day ago

I haven't used Upscayl before, but from glancing at its documentation, it looks like it's using ncnn rather than PyTorch. Can you compare the performance of chaiNNer's ncnn nodes with Upscayl, for a more apples-to-apples comparison?

chaiNNer-org / chaiNNer

Upscale performance feels really slow? #3024