dictation-toolbox / natlink

Natlink provides the interface between Dragon and python
Other
25 stars 17 forks source link

Natlink process does not close and is being reused by natlink on dns 13 #141

Closed LexiconCode closed 1 year ago

LexiconCode commented 1 year ago

The Natlink process does not close when Dragon is closed. Therefore when Dragon starts again the Natlink process is reused. Therefore the Python environment is already initialized and put in a bad state because it was not cleanly exited with the closing of Natlink process.

See traceback without dragonfly imports!

I've also tested this with an empty file _test.py. In this case the natlink loader is still initialized and _test_hud.py` was not loaded. See https://github.com/dictation-toolbox/natlink/issues/141#issuecomment-1272667327

Minimal example of the issue test.zip To reproduce, I've included files that could be loaded by dragon.

  1. Edit get context to your wpython exe in _test_hud.py

    • This Instruct the multi processing library to use pythonw for spawning tasks. Otherwise it tries to boot new copy of DNS.
  2. Launch dragon with the files in the user dir for dragonfly

  3. Close dragon through the tray icon

  4. Relaunch dragon and see error

Note: If the process is killed taskkill /IM natspeak.exe /f /s localhost and DNS is restarted everything behaves as expected.

I'm using DNS 13 on windows 10.

Natlink Version: 5.3.4
Natlink pyd path: C:\Program Files (x86)\Natlink\site-packages\natlink\_natlink_core13.pyd
Use DebugView to debug natlink problems.
    https://docs.microsoft.com/en-us/sysinternals/downloads/debugview
Python Version: 3.10.3 (tags/v3.10.3:a342a49, Mar 16 2022, 12:53:06) [MSC v.1929 32 bit (Intel)]
Natlink is loaded...

Starting natlink loader from config file:
    "C:\Users\MainLaptop\.natlink\natlink.ini"
loading module: _test_hud
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python310-32\lib\site-packages\natlinkcore\loader.py", line 305, in load_or_reload_module
    module = self._import_module_from_path(mod_path)
  File "C:\Program Files (x86)\Python310-32\lib\site-packages\natlinkcore\loader.py", line 261, in _import_module_from_path
    loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\MainLaptop\Documents\Caster\_test_hud.py", line 5, in <module>
    from hud import test_manager
  File "C:\Users\MainLaptop\Documents\Caster\hud.py", line 1, in <module>
    from dragonfly import monitors # Not Used but needed for test
  File "C:\Users\MainLaptop\AppData\Roaming\Python\Python310-32\site-packages\dragonfly\__init__.py", line 33, in <module>
    from .grammar.grammar_connection import ConnectionGrammar
  File "C:\Users\MainLaptop\AppData\Roaming\Python\Python310-32\site-packages\dragonfly\grammar\grammar_connection.py", line 28, in <module>
    from win32com.client  import Dispatch
  File "C:\Program Files (x86)\Python310-32\lib\site-packages\win32com\__init__.py", line 87, in <module>
    SetupEnvironment()
  File "<string>", line 3, in __init__
NameError: name 'len' is not defined
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python310-32\lib\site-packages\natlinkcore\loader.py", line 305, in load_or_reload_module
    module = self._import_module_from_path(mod_path)
  File "C:\Program Files (x86)\Python310-32\lib\site-packages\natlinkcore\loader.py", line 261, in _import_module_from_path
    loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\MainLaptop\Documents\Caster\_test_hud.py", line 5, in <module>
    from hud import test_manager
  File "C:\Users\MainLaptop\Documents\Caster\hud.py", line 1, in <module>
    from dragonfly import monitors # Not Used but needed for test
  File "C:\Users\MainLaptop\AppData\Roaming\Python\Python310-32\site-packages\dragonfly\__init__.py", line 33, in <module>
    from .grammar.grammar_connection import ConnectionGrammar
  File "C:\Users\MainLaptop\AppData\Roaming\Python\Python310-32\site-packages\dragonfly\grammar\grammar_connection.py", line 28, in <module>
    from win32com.client  import Dispatch
  File "C:\Program Files (x86)\Python310-32\lib\site-packages\win32com\__init__.py", line 87, in <module>
    SetupEnvironment()
  File "<string>", line 3, in __init__
NameError: name 'len' is not defined
Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python310-32\Lib\multiprocessing\util.py", line 215, in __call__
    except KeyError:
NameError: name 'KeyError' is not defined

Without dragonfly imports in test code.

Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python310-32\Lib\multiprocessing\util.py", line 215, in __call__
    except KeyError:
NameError: name 'KeyError' is not defined

Danesprite "It looks like these errors can occur because of very late finalization, after things like KeyError and len have been deallocated. This doesn't explain why they occur after start-up.

Another explanation is that some C/C++ code is calling a Python API function and ignoring a return value of NULL, which indicates an exception. Later on, some other C/C++ code calls PyErr_Occurred and produces a nonsensical traceback."

LexiconCode commented 1 year ago

@Danesprite

It works as expected if natspeak.exe is killed first. The error happens if there two "natspeak.exe" running!

PS C:\Windows\system32> taskkill /IM natspeak.exe /f /s localhost
SUCCESS: The process "natspeak.exe" with PID 16108 has been terminated.
SUCCESS: The process "natspeak.exe" with PID 8744 has been terminated.
drmfinlay commented 1 year ago

Thanks for opening this issue.

I don't understand why there would be two natspeak.exe processes running. Perhaps this type of error is why a call to Py_Finalize was omitted from CDgnAppSupport::UnRegister originally. Have you tried reverting #129, rebuilding natlink and testing again?

LexiconCode commented 1 year ago

So testing with out https://github.com/dictation-toolbox/natlink/pull/129

On second run the loader never gets past Starting natlink loader from config file: "C:\Users\MainLaptop\.natlink\natlink.ini" to load _test_hud

How ever there is only one natspeak.exe process.

On second run Natlink in debug mode skipping unchanged loaded module: _test_hud

The process "natspeak.exe" does not end after closing dns even waiting 3 minutes. Even though there's no error I think this is somehow the root cause. Somehow Python is left in a partial initialized state between DNS restarts. Possibly the loader is still initialized from the last start and therefore skipping unchanged loaded module: _test_hud

Another observation is on second load natlink loads way too fast compared to if the process is killed first.

Natlink Version: 5.3.4
Natlink pyd path: C:\Program Files (x86)\Natlink\site-packages\natlink\_natlink_core13.pyd
Use DebugView to debug natlink problems.
    https://docs.microsoft.com/en-us/sysinternals/downloads/debugview
Python Version: 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 13:51:36) [MSC v.1933 32 bit (Intel)]
Natlink is loaded...

set log level to: DEBUG
Starting natlink loader from config file:
    "C:\Users\MainLaptop\.natlink\natlink.ini"
get_user_language, return userLanguage: "enx", (long language: "US English")
set_user_language, user: "MainLaptop", profile: "C:\ProgramData\Nuance\NaturallySpeaking13\Users\MainLaptop\current", language: "enx"
triggering load/reload process
skipping unchanged loaded module: _test_hud
drmfinlay commented 1 year ago

Well that is strange. Yes, it appears that the process is being reused and the interpreter ends up in an odd state the second time round.

Does this behaviour only occur when multiprocessing is used? And does your spawned process exit when DNS is shutdown? If it is specific to multiprocessing, then I think you should probably use threads and subprocess instead.

LexiconCode commented 1 year ago

Yes, the sub process does disappear when DNS closes. @Danesprite

I can replicate the issue by loading an empty file so this is unrelated to multiprocessing.

drmfinlay commented 1 year ago

All right then. I am not able to replicate these errors since I can't install natlinkcore on Python 3.8. If they are a major problem for you, the call to Py_Finalize could be removed; the behaviour that occurs without it seems more acceptable.

LexiconCode commented 1 year ago

@Danesprite I don't have time to compile something for 3.8 at the moment. However, finalize just made the issue stand out, not including it will only ask for future issues.

The root cause really needs to be solved as this is going to continue showing up in very weird issues and they're always going to be hard to diagnose when natlink is released abroad. Which DNS version are you using?

LexiconCode commented 1 year ago

@Danesprite It happens on python 3.8 as well for me.

py -3.8 -m pip install dtactions-1.5.6-py3-none-any.whl dtactions-1.5.6-py3-none-any.zip as admin py -3.8 -m pip install natlinkcore-5.3.5-py3-none-any.whl natlinkcore-5.3.5-py3-none-any.zip install natlink5.3.5-py3.8-32-setup.zip natlink5.3.5-py3.8-32-setup.zip

drmfinlay commented 1 year ago

Yes, I suppose this could cause further problems. I assume that this behaviour does not occur on Natlink version 4.2, before the recent changes to the C++ code. If that is so, and if I am able to reproduce the above errors, git-bisect may be used to track down the root cause. I could give this try.

To answer your question, I am using DNS version 15.61.200.010 (DPI).

Thank you for the zip files. I had compiled Natlink for 3.8, but got an error when the installer tried to install natlinkcore via pip.

drmfinlay commented 1 year ago

I am unable to reproduce the above errors. Natlink behaves normally using the files provided. For the test I used a Natlink user directory containing only the modules in test.zip. As you mention above, the executable path in the _test_hud.py file had to be changed.

LexiconCode commented 1 year ago

I am unable to reproduce the above errors. Natlink behaves normally using the files provided. For the test I used a Natlink user directory containing only the modules in test.zip. As you mention above, the executable path in the _test_hud.py file had to be changed.

I'm using DNS 13 on windows 10.

When I get a chance I will test with Windows 11 and DNS 15

drmfinlay commented 1 year ago

Okay then. Unfortunately, I can't test with DNS 13. I don't recall encountering this behaviour when I used to use that version. That was many years ago, however.

LexiconCode commented 1 year ago

When I get a chance I will test with Windows 11 and DNS 15

I cannot replicate on dns 15

drmfinlay commented 1 year ago

I cannot replicate on dns 15

Hmm. What about with Natlink 4.2?

LexiconCode commented 1 year ago

@Danesprite no, I can't reproduce on Python 2.7

drmfinlay commented 1 year ago

Is that with DNS 13 or 15?

LexiconCode commented 1 year ago

@Danesprite DNS 13

drmfinlay commented 1 year ago

Okay. It seems, then, that this is a regression in Natlink, or in some other part of the system. You could try using git-bisect on the natlink/natlinkcore repository to track it down.

tripfish commented 1 year ago

Hey, since the update to natlink 5.4.4 I have the same issue.

DNS 15.3 Python 3.10.0 Windows 10

I can no longer say exactly which version I had previously installed. But it was one with Python 3.8 support. With this older version, DNS could be closed normally.

LexiconCode commented 1 year ago

@tripfish

Can you replicate this with just a dragonfly script? If you're using caster, you can modify the bat file to dragonfly sample script instead of caster.py or *.py

tripfish commented 1 year ago

I don't use caster. Only Dragonfly with Breathe. However, I noticed that I am loading an XMLRPCServer (SimpleThreadXMLRPCServer) with a file. As soon as I stop loading this file with NatLink (_ removed), Dragon can exit normally.

LexiconCode commented 1 year ago

Do you think you could upload the relevant files to reproduce?

Does https://dragonfly2.readthedocs.io/en/stable/natlink_engine.html#dragonfly.engines.backend_natlink.engine.NatlinkEngine.apply_threading_fix

Have any impact on the issue?

tripfish commented 1 year ago

Here is the file.

I haven't tried the fix yet. I can try in the next few days. _xmlrpc-test-server.zip

tripfish commented 1 year ago

apply_threading_fix() doesn't change anything. In this file, dragonfly is not loaded at all. But even if I load it and use apply_threading_fix(), it has no effect. However, in this file I use threading to create a background thread. Maybe that's the cause?

LexiconCode commented 1 year ago

I believe he needs to be a Damon thread

tripfish commented 1 year ago

Yes that's it. If I set daemon = True, I can exit DNS normally again.

But why did this work in an older natlink version without daemon?

LexiconCode commented 1 year ago

Using Natlink I've had to use Damon threads regardless of python. 2/3. Otherwise, it's not always obvious when the process freezes because the main process is in use.