cubicibo / SUPer

HDMV PGS (BD SUP) subtitle encoder compatible with typesetting effects.
GNU General Public License v3.0
21 stars 4 forks source link

DLL load failed while importing _cl: The specified module could not be found. #21

Closed Alllen closed 8 months ago

Alllen commented 8 months ago

Hi, found this error, on one of my computer ( Windows 10 IoT Enterprise LTSC 21H2) and not on the other. What does it mean please, although accompanied by this error, it seems to complete the processing and the file is successfully generated the .pes accordingly. 2.pes.txt

cubicibo commented 8 months ago

This is an OpenCL error. OpenCL is used by SUPer to communicate with your GPU for faster processing. When you use SUPer gui EXE, you get an OpenCL version that may not be compatible with your GPU.

Possible solutions: 1) Try to update your GPU drivers. 2) Try to use python package directly, rather than the EXE python3 -m pip install C:\...\...\SUPer . Then python3 supergui.py.

And yes, the conversion will still succeed: it fallbacks to CPU code.

Alllen commented 8 months ago

This is an OpenCL error. OpenCL is used by SUPer to communicate with your GPU for faster processing. When you use SUPer gui EXE, you get an OpenCL version that may not be compatible with your GPU.

Possible solutions:

  1. Try to update your GPU drivers.
  2. Try to use python package directly, rather than the EXE python3 -m pip install C:\...\...\SUPer . Then python3 supergui.py.

And yes, the conversion will still succeed: it fallbacks to CPU code.

The error occurred above was executed by using supercli.py to make a command line call, not using the standalone executable GUI.exe.
I tried GUI.exe and got the same error. After updating my gtx1080 to latest graphics driver from the Nvidia website, following your instruction, I did not get that errors any more (however, the perceived speed increase of the GPU processing run was not obvious, lol). Thank you for your help!

By the way, could the issue be related to the driver that comes with Win10? My graphics card previously worked great before, even for playing 3A games. To verify, I tested it by installing a virtual machine (Win10 Pro) on another computer that processed normally, and it reported "fails to load the _cl dll", too. Could you please check the mechanism of the program call or the alert judgment of the reported error?

cubicibo commented 8 months ago

however, the perceived speed increase of the GPU processing run was not obvious, lol

You are absolutely right, it is used only to compute one metric in the conversion process, it is mostly pointless. If you check taskmgr, you will see a GPU usage of 0.1~2% :')

Unfortunately, pyopencl approach is to unconditionally show all and every warnings to the user: See https://github.com/inducer/pyopencl/issues/511 I cannot replicate this issue with my environment. Please test this fix:

from warnings import filterwarnings
filterwarnings("once", message=r"DLL load", module="pyopencl")
filterwarnings("ignore", message=r"Non-empty compiler", module="pyopencl")

Copy paste this chunk of code at the top of supercli.py or supergui.py, before the from SUPer import ... lines.

Alllen commented 8 months ago
from warnings import filterwarnings
filterwarnings("once", message=r"DLL load", module="pyopencl")
filterwarnings("ignore", message=r"Non-empty compiler", module="pyopencl")

Copy paste this chunk of code at the top of supercli.py or supergui.py, before the from SUPer import ... lines.

I paste the code at the top before the "from SUPer..." of supercli.py, but it doesn't work and still has a lot of the "dll load" repeating error messages.

ps: The lines I call the cli is:

if __name__ == '__main__':
...
# args = parser.parse_args()
##
    fake_args = ["-i", f"{xml_path}", "-c", "85", "-a", "100", "-q", "3", "-n", "-b", "709", "-p", "-y", "-w", "-e", "2", "-m", "16000", "-l", "20", f"{pes_path}"]
    args = parser.parse_args(fake_args)
##
....
cubicibo commented 8 months ago

Google tells me this OpenCL message is an ImportError, not a Warning. There is no way to silence an Error in Python, sorry.

In the future, I may add a GPU=False option in the INI file, so OpenCL is not used, but this is a low priority task.

Alllen commented 8 months ago

Ok, this made me learn something new as well, thanks!

Alllen commented 8 months ago

In recent days, I tested some subtitle examples and found that processing a relatively complex XML subtitle is very time-consuming. For instance, an XML file with more than 5000 events takes more than 20-30 minutes to process, during which the CPU usage is high and the fan spins rapidly. Is it also possible to optimize the code further by utilizing multi-threading technology to process multiple events simultaneously and then combine them before writing to the PGS file? Or let the program recognize events with no special effects and skip re-rendering them. This may significantly reduce the overall processing time.

cubicibo commented 8 months ago

Yes, I want to do this too but there are limits:

I will still try to add a second thread in v0.2.4. Maybe it will still be a welcome improvement.

PGS was designed to be easy to decode: Blu-ray players had to be as cheap as possible. All of the complexity is on the encoders. I wish I could write it in C/C++ so it could be fast... but that would be a full-time job. Python was the only choice to make it doable as a hobby project.

cubicibo commented 8 months ago

^I would appreciate if you can test this merge request. The CLI supports up to 8 threads. Unfortunately, the GUI will require much more work to support multithreading.

Furthermore, you can force CPU-only mode via config.ini use_gpu=0.

Alllen commented 8 months ago

^I would appreciate if you can test this merge request. The CLI supports up to 8 threads. Unfortunately, the GUI will require much more work to support multithreading.

Furthermore, you can force CPU-only mode via config.ini use_gpu=0.

No problem, I can help test it.

cubicibo commented 8 months ago

Near final v0.2.4 release: [See release page] Both GUI and CLI supports threading, enjoy.

Alllen commented 8 months ago

Near final v0.2.4 release: [See release page] Both GUI and CLI supports threading, enjoy.

I tested a long and complex XML example with 8005 Events (in 1596 Epochs) to see the efficiency improvement of v0.2.4's running speed. With input , all other parametersettings and my PC conditions the same, I start from the GUI and click "Make it SUPer!" until the CLI prompts "Finished, exiting...Closed gracefully SUPer process." It took 23 minutes on v0.2.3 and less than 6 minutes on v0.2.4 with 8 threads. This is undoubtedly a very amazing version update!

In the other two XML examples tested, I found small problems, one of which is: 205 Events (in 1 Epoch) example, it is a piece of continuous special effects screen, only one epoch, using 8 threads to start a long time after the programme is stopped there, although eventually finished, but it took about more than 2 minutes, difficult to implement, and I noticed that only one thread(W0) working. If one epoch only corresponds to one thread, is it possible to limit the selection of the number of threads, so that when the number of epochs is less than the number of threads selected, only the number of threads equal to the number of epochs will be enabled by default? Or is it possible to assign multiple events in one thread to several threads? Just my thinking...

Another is: during processing, several threads gave the following error like:

Process EpochRenderer-1:6:
Traceback (most recent call last):
  File "multiprocessing\process.py", line 314, in _bootstrap
  File "SUPer\interface.py", line 478, in run
  File "SUPer\interface.py", line 441, in convert
  File "SUPer\render2.py", line 314, in analyze
IndexError: list index out of range

But the program kept running without stopping, and finally it didn't finish, both the CLI and the GUI were stuck there. I've copied the CLI message and uploaded it as an attachment, please check it. comsub.txt

Also, hopefully a timer will be added in future versions so I don't have to get stuck with my stopwatch to keep track of how much time I've spent on one-off processing, lol...

cubicibo commented 8 months ago

Thanks!

If one epoch only corresponds to one thread, is it possible to limit the selection of the number of threads, so that when the number of epochs is less than the number of threads selected, only the number of threads equal to the number of epochs will be enabled by default?

Yes, that is easy: 0446811268b26c24e5c367086b42396331e9de68

Or is it possible to assign multiple events in one thread to several threads? Just my thinking...

Performing multithreading within an epoch is difficult as the epoch conversion process is sequential, not parallelizable.

Another is: during processing, several threads gave the following error like [...]

How did I miss this 🤯 https://github.com/cubicibo/SUPer/pull/23/commits/ef7d57cbcb90ad7f5760661d6af96e60389975ee

Also, hopefully a timer will be added in future versions so I don't have to get stuck with my stopwatch to keep track of how much time I've spent on one-off processing, lol...

It is a gadget but it is effortless to add: 0286657.

Uploaded a new v0.2.4 on release page.

Alllen commented 8 months ago

Thanks!

If one epoch only corresponds to one thread, is it possible to limit the selection of the number of threads, so that when the number of epochs is less than the number of threads selected, only the number of threads equal to the number of epochs will be enabled by default?

Yes, that is easy: 0446811

Or is it possible to assign multiple events in one thread to several threads? Just my thinking...

Performing multithreading within an epoch is difficult as the epoch conversion process is sequential, not parallelizable.

Another is: during processing, several threads gave the following error like [...]

How did I miss this 🤯 ef7d57c

Also, hopefully a timer will be added in future versions so I don't have to get stuck with my stopwatch to keep track of how much time I've spent on one-off processing, lol...

It is a gadget but it is effortless to add: 0286657.

Uploaded a new v0.2.4 on release page.

Hi, with the new 0.2.4 here is this error, which was not present in the previous version:

Loading...
 SUPui: INFO : SUPer v0.2.4, (c) cubicibo
 SUPui: INFO : Advanced image quantizer armed: libimagequant
 SUPui: INFO : No extension given, assuming SUP.
 SUPui: INFO : Starting optimiser process.
 SUPer: INFO : Loading input BDN: H:/1/1.xml
 SUPer: IINF : Parameters: quality_factor=0.85:refresh_rate=1.0:scale_fps=False:quantize_lib=3:bt_colorspace=bt709:no_overlap=True:full_palette=True:output_all_formats=True:normal_case_ok=True:insert_acquisitions=2:max_kbps=16000:log_to_file=0:ssim_tol=0.0:threads=8:daemonize=False
 SUPer: IINF : BDN metadata: 1920x1080, FPS=23.976, DF=False, 205 valid events.
 SUPer: INFO : NDF NTSC detected: scaling all timestamps by 1.001.
Process SUPinternal:
Traceback (most recent call last):
  File "multiprocessing\process.py", line 314, in _bootstrap
  File "multiprocessing\process.py", line 108, in run
  File "supergui.py", line 51, in from_bdnxml
  File "SUPer\interface.py", line 275, in optimise
  File "SUPer\interface.py", line 84, in epoch_events
  File "SUPer\filestreams.py", line 411, in groups
Exception: Events are not ordered in time: 02:35:21:02, 00223706_0.png predates previous event.
 SUPui: INFO : Closed gracefully SUPer process.

the one in the xml script looks like this:

<Event Forced="False" InTC="02:35:21:00" OutTC="02:35:21:01">
<Graphic Width="1920" Height="1080" X="0" Y="0">00223704_0.png</Graphic>
</Event>
<Event Forced="False" InTC="02:35:21:01" OutTC="02:35:21:03">
<Graphic Width="1920" Height="1080" X="0" Y="0">00223705_0.png</Graphic>
</Event>
<Event Forced="False" InTC="02:35:21:02" OutTC="02:35:21:03">
<Graphic Width="1920" Height="1080" X="0" Y="0">00223706_0.png</Graphic>
</Event>

Thanks!

cubicibo commented 8 months ago

00223705_0: InTC="02:35:21:01" OutTC="02:35:21:03" 00223706_0: InTC="02:35:21:02" OutTC="02:35:21:03"

Which software did you use to generate that BDN? As you can see, both 00223705_0 and 00223706_0 use the entire screen (1920x1080) and should appear at the same time at 02:35:21:02: this is impossible. I know avs2bdnxml and derivatives have an issue with the last event, where its start/end time is not computed correctly. This is why many ASS files have an empty event at the end, to correct this. Is this the last event of your file?

While events that overlap in time are possible with BDN XML, they must not overlap in region. And such BDN XML files are not supported by SUPer, because avs2bdnxml or ass2bdnxml cannot generate those anyway.

Alllen commented 8 months ago

I know avs2bdnxml and derivatives have an issue with the last event, where its start/end time is not computed correctly. This is why many ASS files have an empty event at the end, to correct this. Is this the last event of your file?

It was indeed avs2bdnxml. after i edited the line in ass subtitle, this error was solved, thanks!