Open iG8R opened 3 months ago
Yes, I think I already fixed that, will push soon
The fix was to disable async calls by adding this:
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
os.environ["TORCH_USE_CUDA_DSA"] = "1"
But it slows down the process a lot, so I removed it. The error is basically an out of GPU memory error.
Speed is the most important thing, so it doesn't matter that an error may occur if the server restarts by itself afterwards.
Fixed by reinitializing the models, instead of a script restart. Let me know if it works.
Thanks!!! I'll give it a try, but this error doesn't occur that often, so it takes a while to check if everything is ok.
It didn't take long for the error to appear. I noticed that it occurs when VRAM is almost completely full.
[+] [OAWRBOQ1] Imaged cached
[+] [I79FLOQT] Imaged cached
[+] [622FKDPL] Imaged cached
127.0.0.1 - - [29/Jul/2024 22:26:26] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:26] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:26] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:28] "OPTIONS /colorize-image-data HTTP/1.1" 200 -
[+] [TLVVL2TY] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [TLVVL2TY] Requested image: d3a90234-cb1a-4a5f-913e-9d5b62aa27f8, Width: 1013, Height: 1440
[+] [TLVVL2TY] Colorize: True, Upscale: True(x4), Denoise: True
[*] [TLVVL2TY] Denoising image...
[+] [4DNKHQ4T] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [4DNKHQ4T] Requested image: 5188e1ee-d2f4-48fd-b9f7-e194e2ed227b, Width: 1013, Height: 1440
[+] [4DNKHQ4T] Colorize: True, Upscale: True(x4), Denoise: True
[*] [4DNKHQ4T] Denoising image...
[+] [TLVVL2TY] Denoised image [1440, 1013, 4]->[1200, 844, 3] in 0.24 seconds.
[*] [TLVVL2TY] Colorizing image...
[+] [4DNKHQ4T] Denoised image [1440, 1013, 4]->[1200, 844, 3] in 0.18 seconds.
[*] [4DNKHQ4T] Colorizing image...
[+] [C948CGBT] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [C948CGBT] Requested image: b8da78be-519a-4605-996c-aa4a0888183c, Width: 1013, Height: 1440
[+] [C948CGBT] Colorize: True, Upscale: True(x4), Denoise: True
[*] [C948CGBT] Denoising image...
[+] [C948CGBT] Denoised image [1440, 1013, 4]->[1200, 844, 3] in 0.14 seconds.
[*] [C948CGBT] Colorizing image...
[+] [TLVVL2TY] Colorized image [1200, 844, 3]->[819, 576, 3] in 0.54 seconds.
[*] [TLVVL2TY] Upscaling image...
[+] [4DNKHQ4T] Colorized image [1200, 844, 3]->[819, 576, 3] in 0.52 seconds.
[*] [4DNKHQ4T] Upscaling image...
[+] [C948CGBT] Colorized image [1200, 844, 3]->[819, 576, 3] in 0.61 seconds.
[*] [C948CGBT] Upscaling image...
[!] Error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[!] Error: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
[!] Error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[!] [TLVVL2TY] Error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[!] [C948CGBT] Error: The expanded size of the tensor (1024) must match the existing size (288) at non-singleton dimension 3. Target sizes: [1, 3, 1024, 1024]. Tensor sizes: [3, 1024, 288]
[!] [4DNKHQ4T] Error: The expanded size of the tensor (1024) must match the existing size (288) at non-singleton dimension 3. Target sizes: [1, 3, 1024, 1024]. Tensor sizes: [3, 1024, 288]
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 200 -
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 200 -
[2024-07-29 22:26:32,479] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
File "H:\Manga-Colorizer-revamp\app-stream.py", line 106, in colorize_image_data
image = upscale_image(rid, image, upscaler, upscale_factor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\app-stream.py", line 214, in upscale_image
upscaled_image = upscaler.upscale((image.astype('float32') / 255), factor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\upscalator.py", line 33, in upscale
result = tile_process(self.model, result.detach(), scale, self.tile_size, self.tile_pad)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\utils\utils.py", line 161, in tile_process
output[:, :, output_start_y:output_end_y,
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\app-stream.py", line 123, in colorize_image_data
handle_cuda_error(e)
File "H:\Manga-Colorizer-revamp\app-stream.py", line 137, in handle_cuda_error
clear_torch_cache()
File "H:\Manga-Colorizer-revamp\utils\utils.py", line 96, in clear_torch_cache
torch.cuda.empty_cache()
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\torch\cuda\memory.py", line 170, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 500 -
[+] [1EN8CXLT] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [1EN8CXLT] Requested image: dca7812f-9b87-4e92-8d36-cbd88ac33fdc, Width: 1013, Height: 1440
[+] [1EN8CXLT] Colorize: True, Upscale: True(x4), Denoise: True
[*] [1EN8CXLT] Denoising image...
[2024-07-29 22:26:32,805] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
image = denoise_image(rid, image, denoiser, denoise_sigma)
^^^^^^^^
NameError: name 'denoiser' is not defined. Did you mean: 'denoise'?
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 500 -
[+] [4EVT2S6G] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [4EVT2S6G] Requested image: 81987cc0-19f1-4a92-b4b3-59825cd9def6, Width: 1013, Height: 1440
[+] [4EVT2S6G] Colorize: True, Upscale: True(x4), Denoise: True
[*] [4EVT2S6G] Denoising image...
[2024-07-29 22:26:32,951] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
image = denoise_image(rid, image, denoiser, denoise_sigma)
^^^^^^^^
NameError: name 'denoiser' is not defined. Did you mean: 'denoise'?
127.0.0.1 - - [29/Jul/2024 22:26:32] "POST /colorize-image-data HTTP/1.1" 500 -
[+] [5S0M09DV] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 16
[+] [5S0M09DV] Requested image: 110fc530-9512-43f4-9807-dd19b35858bc, Width: 1013, Height: 1440
[+] [5S0M09DV] Colorize: True, Upscale: True(x4), Denoise: True
[*] [5S0M09DV] Denoising image...
[2024-07-29 22:26:33,044] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
image = denoise_image(rid, image, denoiser, denoise_sigma)
^^^^^^^^
NameError: name 'denoiser' is not defined. Did you mean: 'denoise'?
And until the server is restarted, the images will not be processed anymore.
Woah, how did you manage to run out of 12GB VRAM? I'm using a 6GB GPU, and I have only once encountered this error. The parallel processing param is currently inefficient (atleast for me) if a value of more than 2 is set. But you should never encounter this error if you have set it to 1. I've made a small change, can you please test it once again?
The Parallel Processing parameter was set to 3...
There's a problem with the cache: on mangadex.org
it stores images, but after a while the image names change and they get downloaded and saved again, thus cluttering up the cache.
With the new fix, despite the situation that you can see in the screenshot of Task Manager, the server keeps working without any errors.
Regarding how could 12GB VRAM may run out - I used the "Long Strip" option to view a chapter on mangadex.org
.
There's a problem with the cache: on
mangadex.org
it stores images, but after a while the image names change and they get downloaded and saved again, thus cluttering up the cache.
Will fix this, by getting the alt text from the img tag instead of image name, queried by the selector set in site configuration file (todo), if specified. This will also sort the images properly as mangadex stores alt text in a pattern like C1-xxxx, C2-xxxx etc. Even senkuro sets it as Страница 9, Страница 10, (or Page 9, Page 10) etc...
Regarding how could 12GB VRAM may run out
Utilising entire VRAM is not a bad thing at all, it means the work is being done efficiently. But I don't think the parallel feature is working like I intend it to, I'll look into it.
I have not tried the latest changes, but it would often run out of GPU memory for me when trying to process multiple images at once, so I had it only doing one at a time. Even then, a large image might fail so I added the code to back off the requested image size when it caught an out-of-memory error.
For me it just slows down the process a lot, but never run out of memory. Even when I turn off denoise and colorizer, and use only upscaler (making an already large image even larger), it still doesn't run out of memory. @vatavian I'm committing the latest changes in another branch (revamp). Once it's stable, I'll replace it with the main. And rename main to legacy.
Sometimes, I have the same issue with failing to process some large images as @vatavian has.
The issue again starts occurring :( And yet, is it possible to force the server to reboot when this error occurs?
[+] [49V3W6A9] Detected manga: Hazure Waku no Joutai Ijou Skill de Saikyou ni Natta Ore ga Subete wo Juurin Suru made >> Chapter 51.1
[+] [49V3W6A9] Requested image: 16-9d5f2f7f92b35c676bb4cdea44a53483dae743e2d8c14389d0b480cbc588d6a0.jpg, Width: 1013, Height: 1440
[+] [49V3W6A9] Colorize: True, Upscale: True(x4), Denoise: True
[*] [49V3W6A9] Denoising image...
[2024-07-31 21:17:57,091] ERROR in app: Exception on /colorize-image-data [POST]
Traceback (most recent call last):
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask_cors\extension.py", line 178, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "h:\Manga-Colorizer-revamp\venv\Lib\site-packages\flask\app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\app-stream.py", line 98, in colorize_image_data
image = denoise_image(rid, image, denoiser, denoise_sigma)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\Manga-Colorizer-revamp\app-stream.py", line 199, in denoise_image
denoised_image = denoiser.denoise(image, sigma)
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'denoise'
127.0.0.1 - - [31/Jul/2024 21:17:57] "POST /colorize-image-data HTTP/1.1" 500 -
Can be done, make a new python script in same folder as app-stream.py:
restarter.py
import subprocess
import time
def run_script():
while True:
try:
command =
process = subprocess.Popen(['python', 'app-stream.py'])
python_process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
universal_newlines=True)
for line in python_process.stdout:
print(line.strip())
python_process.wait()
except Exception as e:
print(f"An error occurred: {e}")
finally:
print("Script stopped. Restarting in 5 seconds...")
time.sleep(5)
if __name__ == "__main__":
run_script()
Then in the app-stream.py, replace this function:
def handle_cuda_error(e):
global colorizer, upscaler, denoiser
if 'CUDA error: an illegal memory access was encountered' \
in str(e) or 'CUDA out of memory' in str(e) or \
'CUDA error: misaligned address' in str(e):
print(f'[-] CUDA Error encountered, reinitializing...')
colorizer = None
upscaler = None
denoiser = None
clear_torch_cache()
gc.collect()
initialize_components()
With this function:
import sys
def handle_cuda_error(e):
if 'CUDA error: an illegal memory access was encountered' \
in str(e) or 'CUDA out of memory' in str(e) or \
'CUDA error: misaligned address' in str(e):
print(f'[-] CUDA Error encountered, terminating...')
sys.exit("Terminated")
Run the restarter instead of the app-stream. I haven't tested, but this or something like this should work.
Thanks a lot! I'll try, but there have been no errors since the last time.
Sometimes the error mentioned in the title occurs, after which the server stops processing images and keeps giving this error. If the server is manually restarted, it starts working correctly, continuing to process images. Is it possible to make it restart automatically when such an error occurs?