Closed mdbridge closed 2 years ago
I had indeed quite some trouble tackling non ascii characters. I think the readwritefile.py module in the natlink repository solves reading strings from a file, both for utf-8 and latin-1 or cp-1252 encondings. Passing this on to playString and execScript should be done in the C++ code. For the moment the main focus is to get the installer procedure and repository structure as good as possible.
Was this introduced in natlight-light? or did it exist prior to to switch to natlink-light?
I was thinking about investigating this too,
I will try to find out! Still have a python 2 version running, on my "normal" computer...
well we had a python3 natlink working last fall, then we switched to natlink light. I have no idea what needs to be fixed in the C++ — i don't really understand what happens in natlink.
Does something simple need to happen in these functions?
BOOL CDragonCode::playString( const char * pszKeys, DWORD dwFlags )
or
BOOL CDragonCode::execScript( const char * pszScript, PCCHAR * ppWords, const char * pszComment )
?
@mdbridge can you help with the C++?
Also, there are some conditional compiles.
My suspicion is that the problem will manifest only on dragon15 but not on dragon13. Do either of you have a dragon13 to test this?
`
/*int size_needed = ::MultiByteToWideChar( CP_UTF8, 0, pszKeys, -1, NULL, 0 );
CPointerChar pszKeysW = new TCHAR[ size_needed ];
::MultiByteToWideChar( CP_UTF8, 0, pszKeys, -1, pszKeysW, size_needed );*/
CComBSTR bstrKeys( pszKeys );
rc = m_pIDgnSSvcOutputEvent->PlayString(
bstrKeys, // string to send
dwFlags, // flags
0xFFFFFFFF, // delay (-1 for app specific delay)
dwClientCode, // to identify which WM_PLAYBACK is ours
&dwNumUndo ); // not used (number of backspaces needed to undo)
#else
rc = m_pIDgnSSvcOutputEvent->PlayString(
pszKeys, // string to send
dwFlags, // flags
0xFFFFFFFF, // delay (-1 for app specific delay)
dwClientCode, // to identify which WM_PLAYBACK is ours
&dwNumUndo ); // not used (number of backspaces needed to undo)
#endif
`
::MultiByteToWideChar( CP_UTF8, 0, pszKeys, -1, pszKeysW, size_needed );/ CComBSTR bstrKeys( pszKeys );
Is the culprit CP_UTF8 in the first argument (code page)?
There are several options here https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte but which would be correct?
oh it is confusing, somebody commented out code and left it in years ago.
MultiByteToWideChar is not actually called. Unicode is relying on CComBstr default behavior which probably results in ansi code page.
I think this problem has always been in dragon15 versions of natlink.
So maybe something like this in playString? https://github.com/dictation-toolbox/natlink/blob/master/NatlinkSource/DragonCode.cpp
if this is the correct approach what is the choice for code_page?
` int code_page=CP_ACP (or CP_THREAD_ACP, or 1252)?
int size_needed = ::MultiByteToWideChar( CP_UTF8, 0, pszKeys, -1, NULL, 0 );
BSTR pszKeysW = new TCHAR[ size_needed ];
::MultiByteToWideChar( CP_UTF8, 0, pszKeys, -1, pszKeysW, size_needed );
CComBSTR bstrKeys();
bstrKeys.attach(pszKeysW);
rc = m_pIDgnSSvcOutputEvent->PlayString(
bstrKeys, // string to send
dwFlags, // flags
0xFFFFFFFF, // delay (-1 for app specific delay)
dwClientCode, // to identify which WM_PLAYBACK is ours
&dwNumUndo ); // not used (number of backspaces needed to undo)
`
If this looks like the right approach, execScript, should be changed.
There are a bunch of places where calls to ::MultiByteToWideChar( are commented out. Possible it was just thought CComBstr would take care of it.
Is it possible to write a test in natlinkcore/tests that would reproduce the problem?
Do either of you have a dragon13 to test this?
I do and could test.
I have created the following branches in my forks to work on this.
https://github.com/dougransom/natlink/tree/natlink_issue_113
https://github.com/dougransom/natlinkcore/tree/natlink_issue_113
unit test for failing playString added in https://github.com/dougransom/natlinkcore/commit/18c65590c589fb8413e61cde51cf31a85eca149d
added some simple unittests for execscript as well.
Now in file
https://github.com/dougransom/natlinkcore/blob/natlink_issue_113/tests/manual_test_dragoncode.py
I can confirm, that in the python2.7 version of Natlink, with Dragon 15, the error is also present.
natlink.playString("na\xc3ve") (which is i with dots) produces
naA~eve (tilde on top of A)
Great, solves my problem with env variable NATLINK_USERDIR too. Thanks Doug. We should incorporate the unittests of natlink itself in this too. Do we do this in natlinkcore repo or in the natlink repo? And if in the natlink repo, which branch are we now on?
Quintijn Hoogenboom @.***> writes:
I can confirm, that in the python2.7 version of Natlink, with Dragon 15, the error is also present.
natlink.playString("na\xc3ve") (which is i with dots) produces
naA~eve (tilde on top of A)
No, that's that's the correct behavior. \xc3 in the windows-1252
encoding is an A with a ~ on top -- see https://en.wikipedia.org/wiki/Windows-1252.
Naive w/ 2 dots on the i is na\xefve in Windows-1252.
Remember that playString like many of the other Dragon routines
takes Windows-1252 encoded data.
Note for now code for tests mentioned above (and natinkcore) is in a new repo:
Ok, i have figured this out for displayText and the Script for execScript. I am not going to fix for the arguments of execScript unless someone actually uses it, as it is a fair bit more investigation, though the approach is the same: _natlinkcore.pyd must accept windows-1252 encoded data for stuff being send on to natlink. easy to to the encoding in init.py.
I don't see teh issue with the callback.
i think it is fixed in https://github.com/dictation-toolbox/natlink/commit/0aa24b8934176b32d1a287f750c91a9752e39a74.
I have only tested through running code in a python shell — not in vocola or unimaro etc. I'll wait till someone tries this before i push to DT:master.
Doug Ransom @.***> writes:
I don't see teh issue with the callback.
If you dictate say "naïve" using a dictation grammar like vortex
does, the wrong characters are sent to the dictation_change_callback. That callback is set by
~/voice/Vocola_development/Vocola_2/build/Vocola/to_MacroSystem/_vortex.py:113: def init(self, handler): self.my_handle = None self.handler = handler self.underlying_DictObj = natlink.DictObj() self.underlying_DictObj.setBeginCallback (handler.dictation_begin_callback) self.underlying_DictObj.setChangeCallback(handler.dictation_change_callback)
~/voice/Vocola_development/Vocola_2/build/Vocola/to_MacroSystem/_vortex.py:387: def dictation_change_callback(self, deletion_start, deletion_end, new_text, selection_start, selection_end):
In particular, if I remember correctly, new_text has the incorrect characters.
I think i know what is going on. I am fixing stuff without really understanding the big picture I am afraid.
I think: CDictationObject::TextChanged is called when text is dicated. The new text is grabbed. the text probably is in windws-12whatever encoding, but treated as unicode without conversion and sent to Python.
This is fixed, by Doug, in the upcoming release of Natlink. Also the recognitionMimic call seems to work now.
Several natlink functions are tested (succesfully) in unittestNatlink...
For example, natlink.playString("naïve") produces naïve. A similar issue exists with natlink.execScript.
discovered that dictation_begin_callback is also broken in a similar way -- naïve turning into naïve
" the non ascii issue is tedious and not very easy to solve. ": this shouldn't be too hard; just translate the Unicode into Windows-1252, changing unencodable characters to .something like '?'. until this is fixed Vocola/vortex are essentially unusable, especially for people using foreign languages