"Error: CUDA error: an illegal memory access was encountered" causes the server to stop working properly until a manual restart

iG8R commented 3 months ago

Sometimes the error mentioned in the title occurs, after which the server stops processing images and keeps giving this error. If the server is manually restarted, it starts working correctly, continuing to process images. Is it possible to make it restart automatically when such an error occurs?

127.0.0.1 - - [29/Jul/2024 13:14:13] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 13:14:55] "OPTIONS /colorize-image-data HTTP/1.1" 200 -
[+] [D2FZ7T3G] Requested image: 684f4f37-e95d-44e2-88c1-730889db4114, Width: 1125, Height: 1600
[+] [D2FZ7T3G] Colorize: True, Upscale: True(x4), Denoise: True
[*] [D2FZ7T3G] Denoising image...
[!] [D2FZ7T3G] Error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

127.0.0.1 - - [29/Jul/2024 13:14:56] "POST /colorize-image-data HTTP/1.1" 200 -
[+] [4NF2I4X9] Requested image: c2dce404-1793-4c99-8dd4-e1cb2adb02cd, Width: 1125, Height: 1600
[+] [4NF2I4X9] Colorize: True, Upscale: True(x4), Denoise: True
[*] [4NF2I4X9] Denoising image...
[!] [4NF2I4X9] Error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

127.0.0.1 - - [29/Jul/2024 13:14:56] "POST /colorize-image-data HTTP/1.1" 200 -

(venv) h:\Manga-Colorizer-revamp>app-stream.py
H:\Manga-Colorizer-revamp\colorizator.py:22: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(config.colorizer_path, map_location=self.device)
H:\Manga-Colorizer-revamp\denoising\denoiser.py:42: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(weights_path, map_location=torch.device('cpu'))
H:\Manga-Colorizer-revamp\upscalator.py:19: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  model = torch.load(config.upscaler_path, map_location=self.device)
C:\Python312\Lib\site-packages\torch\serialization.py:1189: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
C:\Python312\Lib\site-packages\torch\serialization.py:1189: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
C:\Python312\Lib\site-packages\torch\serialization.py:1189: SourceChangeWarning: source code of class 'torch.nn.modules.activation.LeakyReLU' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
 * Serving Flask app 'app-stream'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on https://127.0.0.1:5000
 * Running on https://192.168.0.3:5000
Press CTRL+C to quit
127.0.0.1 - - [29/Jul/2024 13:15:53] "OPTIONS /colorize-image-data HTTP/1.1" 200 -
[+] [7YLY6313] Requested image: 4c1ed87e-1587-43cf-8e83-45794973117c, Width: 1013, Height: 1440
[+] [7YLY6313] Colorize: True, Upscale: True(x4), Denoise: True
[*] [7YLY6313] Denoising image...
H:\Manga-Colorizer-revamp\denoising\functions.py:40: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\tensor\python_tensor.cpp:80.)
  downsampledfeatures = torch.cuda.FloatTensor(N, Cout, Hout, Wout).fill_(0)
[+] [7YLY6313] Denoised image [1440, 1013, 4]->[1200, 844, 3] in 0.52 seconds.
[*] [7YLY6313] Colorizing image...
[+] [7YLY6313] Colorized image [1200, 844, 3]->[819, 576, 3] in 0.37 seconds.
[*] [7YLY6313] Upscaling image...
[+] [7YLY6313] Upscaled image (x4) [819, 576, 3]->[3276, 2304, 3] in 0.54 seconds.
127.0.0.1 - - [29/Jul/2024 13:15:57] "POST /colorize-image-data HTTP/1.1" 200 -
[+] [CY1F6KS6] Requested image: 5556783d-16ab-4d7f-b9eb-dcc15427a8ee, Width: 1125, Height: 1600
[+] [CY1F6KS6] Colorize: True, Upscale: True(x4), Denoise: True
[*] [CY1F6KS6] Denoising image...
[+] [CY1F6KS6] Denoised image [1600, 1125, 4]->[1200, 843, 3] in 0.23 seconds.
[*] [CY1F6KS6] Colorizing image...
[+] [CY1F6KS6] Colorized image [1200, 843, 3]->[820, 576, 3] in 0.29 seconds.
[*] [CY1F6KS6] Upscaling image...
[+] [CY1F6KS6] Upscaled image (x4) [820, 576, 3]->[3280, 2304, 3] in 0.55 seconds.
127.0.0.1 - - [29/Jul/2024 13:16:01] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 13:16:20] "OPTIONS /colorize-image-data HTTP/1.1" 200 -
[+] [3RV2N4KH] Requested image: f5bde9bf-315d-45ff-8ecd-f069b24be781, Width: 1125, Height: 1600
[+] [3RV2N4KH] Colorize: True, Upscale: True(x4), Denoise: True
[*] [3RV2N4KH] Denoising image...
[+] [3RV2N4KH] Denoised image [1600, 1125, 4]->[1200, 843, 3] in 0.39 seconds.
[*] [3RV2N4KH] Colorizing image...
[+] [3RV2N4KH] Colorized image [1200, 843, 3]->[820, 576, 3] in 0.19 seconds.
[*] [3RV2N4KH] Upscaling image...
[+] [3RV2N4KH] Upscaled image (x4) [820, 576, 3]->[3280, 2304, 3] in 0.59 seconds.
127.0.0.1 - - [29/Jul/2024 13:16:23] "POST /colorize-image-data HTTP/1.1" 200 -

BinitDOX commented 3 months ago

Yes, I think I already fixed that, will push soon

BinitDOX commented 3 months ago

The fix was to disable async calls by adding this:

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
os.environ["TORCH_USE_CUDA_DSA"] = "1"

But it slows down the process a lot, so I removed it. The error is basically an out of GPU memory error.

iG8R commented 3 months ago

Speed is the most important thing, so it doesn't matter that an error may occur if the server restarts by itself afterwards.

BinitDOX commented 3 months ago

Fixed by reinitializing the models, instead of a script restart. Let me know if it works.

iG8R commented 3 months ago

Thanks!!! I'll give it a try, but this error doesn't occur that often, so it takes a while to check if everything is ok.

iG8R commented 3 months ago

It didn't take long for the error to appear. I noticed that it occurs when VRAM is almost completely full.

[+] [OAWRBOQ1] Imaged cached
[+] [I79FLOQT] Imaged cached
[+] [622FKDPL] Imaged cached
127.0.0.1 - - [29/Jul/2024 22:26:26] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:26] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:26] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:28] "OPTIONS /colorize-image-data HTTP/1.1" 200 -
[+] [TLVVL2TY] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [TLVVL2TY] Requested image: d3a90234-cb1a-4a5f-913e-9d5b62aa27f8, Width: 1013, Height: 1440
[+] [TLVVL2TY] Colorize: True, Upscale: True(x4), Denoise: True
[*] [TLVVL2TY] Denoising image...
[+] [4DNKHQ4T] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [4DNKHQ4T] Requested image: 5188e1ee-d2f4-48fd-b9f7-e194e2ed227b, Width: 1013, Height: 1440
[+] [4DNKHQ4T] Colorize: True, Upscale: True(x4), Denoise: True
[*] [4DNKHQ4T] Denoising image...
[+] [TLVVL2TY] Denoised image [1440, 1013, 4]->[1200, 844, 3] in 0.24 seconds.
[*] [TLVVL2TY] Colorizing image...
[+] [4DNKHQ4T] Denoised image [1440, 1013, 4]->[1200, 844, 3] in 0.18 seconds.
[*] [4DNKHQ4T] Colorizing image...
[+] [C948CGBT] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [C948CGBT] Requested image: b8da78be-519a-4605-996c-aa4a0888183c, Width: 1013, Height: 1440
[+] [C948CGBT] Colorize: True, Upscale: True(x4), Denoise: True
[*] [C948CGBT] Denoising image...
[+] [C948CGBT] Denoised image [1440, 1013, 4]->[1200, 844, 3] in 0.14 seconds.
[*] [C948CGBT] Colorizing image...
[+] [TLVVL2TY] Colorized image [1200, 844, 3]->[819, 576, 3] in 0.54 seconds.
[*] [TLVVL2TY] Upscaling image...
[+] [4DNKHQ4T] Colorized image [1200, 844, 3]->[819, 576, 3] in 0.52 seconds.
[*] [4DNKHQ4T] Upscaling image...
[+] [C948CGBT] Colorized image [1200, 844, 3]->[819, 576, 3] in 0.61 seconds.
[*] [C948CGBT] Upscaling image...
[!] Error:  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[!] Error:  cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

[!] Error:  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[!] [TLVVL2TY] Error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[!] [C948CGBT] Error: The expanded size of the tensor (1024) must match the existing size (288) at non-singleton dimension 3.  Target sizes: [1, 3, 1024, 1024].  Tensor sizes: [3, 1024, 288]
[!] [4DNKHQ4T] Error: The expanded size of the tensor (1024) must match the existing size (288) at non-singleton dimension 3.  Target sizes: [1, 3, 1024, 1024].  Tensor sizes: [3, 1024, 288]
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 200 -
[2024-07-29 22:26:32,479] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 106, in colorize_image_data
    image = upscale_image(rid, image, upscaler, upscale_factor)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 214, in upscale_image
    upscaled_image = upscaler.upscale((image.astype('float32') / 255), factor)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\upscalator.py", line 33, in upscale
    result = tile_process(self.model, result.detach(), scale, self.tile_size, self.tile_pad)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\utils\utils.py", line 161, in tile_process
    output[:, :, output_start_y:output_end_y,
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
                                                ^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 123, in colorize_image_data
    handle_cuda_error(e)
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 137, in handle_cuda_error
    clear_torch_cache()
  File "H:\Manga-Colorizer-revamp\utils\utils.py", line 96, in clear_torch_cache
    torch.cuda.empty_cache()
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\torch\cuda\memory.py", line 170, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 500 -
[+] [1EN8CXLT] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [1EN8CXLT] Requested image: dca7812f-9b87-4e92-8d36-cbd88ac33fdc, Width: 1013, Height: 1440
[+] [1EN8CXLT] Colorize: True, Upscale: True(x4), Denoise: True
[*] [1EN8CXLT] Denoising image...
[2024-07-29 22:26:32,805] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
                                                ^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
    image = denoise_image(rid, image, denoiser, denoise_sigma)
                                      ^^^^^^^^
NameError: name 'denoiser' is not defined. Did you mean: 'denoise'?
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 500 -
[+] [4EVT2S6G] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [4EVT2S6G] Requested image: 81987cc0-19f1-4a92-b4b3-59825cd9def6, Width: 1013, Height: 1440
[+] [4EVT2S6G] Colorize: True, Upscale: True(x4), Denoise: True
[*] [4EVT2S6G] Denoising image...
[2024-07-29 22:26:32,951] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
                                                ^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
    image = denoise_image(rid, image, denoiser, denoise_sigma)
                                      ^^^^^^^^
NameError: name 'denoiser' is not defined. Did you mean: 'denoise'?
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 500 -
[+] [5S0M09DV] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [5S0M09DV] Requested image: 110fc530-9512-43f4-9807-dd19b35858bc, Width: 1013, Height: 1440
[+] [5S0M09DV] Colorize: True, Upscale: True(x4), Denoise: True
[*] [5S0M09DV] Denoising image...
[2024-07-29 22:26:33,044] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
                                                ^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
    image = denoise_image(rid, image, denoiser, denoise_sigma)
                                      ^^^^^^^^
NameError: name 'denoiser' is not defined. Did you mean: 'denoise'?

iG8R commented 3 months ago

And until the server is restarted, the images will not be processed anymore.

iG8R commented 3 months ago

BinitDOX commented 3 months ago

Woah, how did you manage to run out of 12GB VRAM? I'm using a 6GB GPU, and I have only once encountered this error. The parallel processing param is currently inefficient (atleast for me) if a value of more than 2 is set. But you should never encounter this error if you have set it to 1. I've made a small change, can you please test it once again?

iG8R commented 3 months ago

The Parallel Processing parameter was set to 3...

iG8R commented 3 months ago

There's a problem with the cache: on mangadex.org it stores images, but after a while the image names change and they get downloaded and saved again, thus cluttering up the cache.

iG8R commented 3 months ago

With the new fix, despite the situation that you can see in the screenshot of Task Manager, the server keeps working without any errors. Regarding how could 12GB VRAM may run out - I used the "Long Strip" option to view a chapter on mangadex.org.

BinitDOX commented 3 months ago

There's a problem with the cache: on mangadex.org it stores images, but after a while the image names change and they get downloaded and saved again, thus cluttering up the cache.

Will fix this, by getting the alt text from the img tag instead of image name, queried by the selector set in site configuration file (todo), if specified. This will also sort the images properly as mangadex stores alt text in a pattern like C1-xxxx, C2-xxxx etc. Even senkuro sets it as Страница 9, Страница 10, (or Page 9, Page 10) etc...

Regarding how could 12GB VRAM may run out

Utilising entire VRAM is not a bad thing at all, it means the work is being done efficiently. But I don't think the parallel feature is working like I intend it to, I'll look into it.

vatavian commented 3 months ago

I have not tried the latest changes, but it would often run out of GPU memory for me when trying to process multiple images at once, so I had it only doing one at a time. Even then, a large image might fail so I added the code to back off the requested image size when it caught an out-of-memory error.

BinitDOX commented 3 months ago

For me it just slows down the process a lot, but never run out of memory. Even when I turn off denoise and colorizer, and use only upscaler (making an already large image even larger), it still doesn't run out of memory. @vatavian I'm committing the latest changes in another branch (revamp). Once it's stable, I'll replace it with the main. And rename main to legacy.

iG8R commented 3 months ago

Sometimes, I have the same issue with failing to process some large images as @vatavian has.

iG8R commented 3 months ago

The issue again starts occurring :( And yet, is it possible to force the server to reboot when this error occurs?

[+] [49V3W6A9] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 51.1
[+] [49V3W6A9] Requested image: 16-9d5f2f7f92b35c676bb4cdea44a53483dae743e2d8c14389d0b480cbc588d6a0.jpg, Width: 1013, Height: 1440
[+] [49V3W6A9] Colorize: True, Upscale: True(x4), Denoise: True
[*] [49V3W6A9] Denoising image...
[2024-07-31 21:17:57,091] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
                                                ^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
    image = denoise_image(rid, image, denoiser, denoise_sigma)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Manga-Colorizer-revamp\app-stream.py", line 199, in denoise_image
    denoised_image = denoiser.denoise(image, sigma)
                     ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'denoise'
127.0.0.1 - - [31/Jul/2024 21:17:57] "POST /colorize-image-data HTTP/1.1" 500 -

BinitDOX commented 3 months ago

Can be done, make a new python script in same folder as app-stream.py:

restarter.py

import subprocess
import time

def run_script():
    while True:
        try:
            command = 
            process = subprocess.Popen(['python', 'app-stream.py'])
            python_process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, 
                                         universal_newlines=True)

            for line in python_process.stdout:
                print(line.strip())

            python_process.wait()
        except Exception as e:
            print(f"An error occurred: {e}")
        finally:
            print("Script stopped. Restarting in 5 seconds...")
            time.sleep(5)

if __name__ == "__main__":
    run_script()

Then in the app-stream.py, replace this function:

def handle_cuda_error(e):
    global colorizer, upscaler, denoiser

    if 'CUDA error: an illegal memory access was encountered' \
        in str(e) or 'CUDA out of memory' in str(e) or \
        'CUDA error: misaligned address' in str(e):
        print(f'[-] CUDA Error encountered, reinitializing...')
        colorizer = None
        upscaler = None
        denoiser = None
        clear_torch_cache()
        gc.collect()
        initialize_components()

With this function:

import sys

def handle_cuda_error(e):
     if 'CUDA error: an illegal memory access was encountered' \
        in str(e) or 'CUDA out of memory' in str(e) or \
        'CUDA error: misaligned address' in str(e):
        print(f'[-] CUDA Error encountered, terminating...')
        sys.exit("Terminated")

Run the restarter instead of the app-stream. I haven't tested, but this or something like this should work.

iG8R commented 3 months ago

Thanks a lot! I'll try, but there have been no errors since the last time.

BinitDOX / Manga-Colorizer

"Error: CUDA error: an illegal memory access was encountered" causes the server to stop working properly until a manual restart #14