Closed dmacoder closed 8 years ago
ANSI version cannot display UNICODE characters!
;test.ahk
tip=
(
이 번역기는 완벽한 번역기가 아닙니다.
Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
)
gosub,Tray_Init
msgbox,% tip
exitapp
Tray_Init:
Menu, Tray, NoStandard
Menu, Tray, DeleteAll
Menu, tray, add, 사이트 방문 , 사이트방문
return
사이트방문:
run, IEXPLORE.EXE "http://translate.google.com"
return
But this script(Korean Language) is fine in Ansi Version before Commits on Jun 13, 2016 - Improve compiled code protection. I modify GetLine functions in existing C++ code (in script.cpp) And then it works. But, a compiled script(likes over 10000 lines of script) is very slow to launch. Performance is very slow. Can you do fix it?
//aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
//replace below 4 lines
CStringA sChar;
StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
_tcscpy(aBuf, sChar);
aBuf_length = _tcslen(aBuf);
size_t Script::GetLine(LPTSTR aBuf, int aMaxCharsToRead, int aInContinuationSection, TextStream *ts)
{
size_t aBuf_length = 0;
if (!aBuf || !ts) return -1;
if (aMaxCharsToRead < 1) return 0;
if ( !(aBuf_length = ts->ReadLine(aBuf, aMaxCharsToRead)) ) // end-of-file or error
{
*aBuf = '\0'; // Reset since on error, contents added by fgets() are indeterminate.
return -1;
}
if (aBuf[aBuf_length-1] == '\n')
--aBuf_length;
aBuf[aBuf_length] = '\0';
if (g_hResource)
{
DWORD aSizeEncrypted = LINE_SIZE * sizeof(TCHAR);
BYTE *data = (BYTE*)malloc(LINE_SIZE * sizeof(TCHAR));
g_CryptStringToBinary(aBuf, NULL, CRYPT_STRING_BASE64, data, &aSizeEncrypted, NULL, NULL);
LPVOID aDataBuf;
if (*(unsigned int*)data == 0x04034b50)
{
if (aSizeEncrypted = DecompressBuffer(data, aDataBuf, aSizeEncrypted, g_default_pwd))
{
#ifdef _UNICODE
aBuf_length = UTF8ToUTF16((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
#else
//aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
//replace below 4 lines
CStringA sChar;
StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
_tcscpy(aBuf, sChar);
aBuf_length = _tcslen(aBuf);
#endif
SecureZeroMemory(aDataBuf, aSizeEncrypted);
g_VirtualFree(aDataBuf, 0, MEM_RELEASE);
}
else
return -1;
}
free(data);
}
if (aInContinuationSection)
{
LPTSTR cp = omit_leading_whitespace(aBuf);
if (aInContinuationSection == CONTINUATION_SECTION_WITHOUT_COMMENTS) // By default, continuation sections don't allow comments (lines beginning with a semicolon are treated as literal text).
{
// Caller relies on us to detect the end of the continuation section so that trimming
// will be done on the final line of the section and so that a comment can immediately
// follow the closing parenthesis (on the same line). Example:
// (
// Text
// ) ; Same line comment.
if (*cp != ')') // This isn't the last line of the continuation section, so leave the line untrimmed (caller will apply the ltrim setting on its own).
return aBuf_length; // Earlier sections are responsible for keeping aBufLength up-to-date with any changes to aBuf.
//else this line starts with ')', so continue on to later section that checks for a same-line comment on its right side.
}
else // aInContinuationSection == CONTINUATION_SECTION_WITH_COMMENTS (i.e. comments are allowed in this continuation section).
{
// Fix for v1.0.46.09+: The "com" option shouldn't put "ltrim" into effect.
if (!_tcsncmp(cp, g_CommentFlag, g_CommentFlagLength)) // Case sensitive.
{
*aBuf = '\0'; // Since this line is a comment, have the caller ignore it.
return -2; // Callers tolerate -2 only when in a continuation section. -2 indicates, "don't include this line at all, not even as a blank line to which the JOIN string (default "\n") will apply.
}
if (*cp == ')') // This isn't the last line of the continuation section, so leave the line untrimmed (caller will apply the ltrim setting on its own).
{
ltrim(aBuf); // Ltrim this line unconditionally so that caller will see that it starts with ')' without having to do extra steps.
aBuf_length = _tcslen(aBuf); // ltrim() doesn't always return an accurate length, so do it this way.
}
}
}
// Since above didn't return, either:
// 1) We're not in a continuation section at all, so apply ltrim() to support semicolons after tabs or
// other whitespace. Seems best to rtrim also.
// 2) CONTINUATION_SECTION_WITHOUT_COMMENTS but this line is the final line of the section. Apply
// trim() and other logic further below because caller might rely on it.
// 3) CONTINUATION_SECTION_WITH_COMMENTS (i.e. comments allowed), but this line isn't a comment (though
// it may start with ')' and thus be the final line of this section). In either case, need to check
// for same-line comments further below.
if (aInContinuationSection != CONTINUATION_SECTION_WITH_COMMENTS) // Case #1 & #2 above.
{
aBuf_length = trim(aBuf);
if (!_tcsncmp(aBuf, g_CommentFlag, g_CommentFlagLength)) // Case sensitive.
{
// Due to other checks, aInContinuationSection==false whenever the above condition is true.
*aBuf = '\0';
return 0;
}
}
//else CONTINUATION_SECTION_WITH_COMMENTS (case #3 above), which due to other checking also means that
// this line isn't a comment (though it might have a comment on its right side, which is checked below).
// CONTINUATION_SECTION_WITHOUT_COMMENTS would already have returned higher above if this line isn't
// the last line of the continuation section.
// Handle comment-flags that appear to the right of a valid line.
LPTSTR cp, prevp;
for (cp = _tcsstr(aBuf, g_CommentFlag); cp; cp = _tcsstr(cp + g_CommentFlagLength, g_CommentFlag))
{
// If no whitespace to its left, it's not a valid comment.
// We insist on this so that a semi-colon (for example) immediately after
// a word (as semi-colons are often used) will not be considered a comment.
prevp = cp - 1;
if (prevp < aBuf) // should never happen because we already checked above.
{
*aBuf = '\0';
return 0;
}
if (IS_SPACE_OR_TAB_OR_NBSP(*prevp)) // consider it to be a valid comment flag
{
*prevp = '\0';
aBuf_length = rtrim_with_nbsp(aBuf, prevp - aBuf); // Since it's our responsibility to return a fully trimmed string.
break; // Once the first valid comment-flag is found, nothing after it can matter.
}
else // No whitespace to the left.
if (*prevp == g_EscapeChar) // Remove the escape char.
{
// The following isn't exactly correct because it prevents an include filename from ever
// containing the literal string "`;". This is because attempts to escape the accent via
// "``;" are not supported. This is documented here as a known limitation because fixing
// it would probably break existing scripts that rely on the fact that accents do not need
// to be escaped inside #Include. Also, the likelihood of "`;" appearing literally in a
// legitimate #Include file seems vanishingly small.
tmemmove(prevp, prevp + 1, _tcslen(prevp + 1) + 1); // +1 for the terminator.
--aBuf_length;
// Then continue looking for others.
}
// else there wasn't any whitespace to its left, so keep looking in case there's
// another further on in the line.
} // for()
return aBuf_length; // The above is responsible for keeping aBufLength up-to-date with any changes to aBuf.
}
If it worked before then it was a bug, try running it with ANSI version without compiling, also with original AutoHotkey.
Both Running it(test.ahk) with latest ANSI version of Autohotkey_L(Autohotkey.exe) without compiling And Running test.exe (compiled script) are worked fine.
link: https://s3.postimg.org/wbcsmgsub/Latest_Autohotkey_L.jpg
Running it(test.ahk) with latest ANSI version of Autohotkey_H(Autohotkey.exe) without compiling is worked fine. But Running test_H.exe (compiled script) is not worked.
link:
이 번역기는 완벽한 번역기가 아닙니다.
Hello? 안녕하세요
? ???? ??? ???? ????.
Hello? ?????
Check out the following links https://s3.postimg.org/wbcsmgsub/Latest_Autohotkey_L.jpg https://s3.postimg.org/swpq5bn37/Latest_Autohotkey_H.jpg
Download ANSI version (https://autohotkey.com/download/ahk-a32.zip) and execute your script with it, you get:
? ???? ??? ???? ????.
Hello? ????? (TODO: ?? ?? ????)
Why do you think is it called ANSI?
I Download ANSI version (https://autohotkey.com/download/ahk-a32.zip) and execute script with it. It works well. You can check the results at the following link:
I get: 이 번역기는 완벽한 번역기가 아닙니다.
Hello? 안녕하세요
I know that Ansi version of Autohotkey_L supported Korean Language. Does not Ansi Version of Autohotkey support Korean Language?
I don't know exactly... I think Ansi is a form of MultiByte Character Set (MBCS) called Double-Byte Character Set (DBCS).
Under MBCS, characters are encoded in either 1 or 2 bytes. In 2-byte characters, the first, or lead byte, signals that both it and the following byte are to be interpreted as one character. The first byte comes from a range of codes reserved for use as lead bytes. Which ranges of bytes can be lead bytes depends on the code page in use. For example, Japanese code page 932 uses the range 0x81 through 0x9F as lead bytes, but Korean code page 949 uses a different range.
By definition, the ASCII character set is a subset of all multibyte-character sets. In many multibyte character sets, each character in the range 0x00 – 0x7F is identical to the character that has the same value in the ASCII character set. For example, in both ASCII and MBCS character strings, the 1-byte NULL character ('\0') has value 0x00 and indicates the terminating null character.
ASCII character set: the ASCII character set is only 256 characters, represented by 0-255 numbers. Including the size of the letters, numbers and special characters; such as punctuation, currency symbols, etc.. For most Latin languages, these characters have been enough. However, many of the characters used by many Asian and Oriental languages are far more than 256 characters.
ANSI encoding and MBCS (multi byte encoding)ANSI (American National Standards Institute), that is, every country (non Latin speaking countries) to develop their own text of the encoding rules, and get ANSI approval, ANSI standards, the world in the same time that the corresponding national language is called ANSI encoding. As for the simplified Chinese encoding GB2312, in fact it is a code page of ANSI 936.
For platforms used in markets whose languages use large character sets, the best alternative to Unicode is MBCS.
I can see that in the follow link :
Anyway I have to use Ansi version of autohotkey. Because some scripts written for AutoHotkey_B will function correctly on the ANSI version of AutoHotkey_L but fail on Unicode versions.
[DB - Ansi ver] Ansi \u00BD\u00C3\u00B0\u00A3 \u00BF\u00AA\u00BC\u00F8\u00C0\u00D4\u00C0\u00E5 [DB Example - Unicode ver] Unicode \uC2DC\uAC04 Unicode
I don't have Unicode version of DataBase.
My script(Korean Language) is fine in Ansi Version of Autohotkey_L. But not the Latest AutoHotkey_H. I modify GetLine functions in existing C++ code (in script.cpp) And then it works. But, a compiled script(likes over 10000 lines of script) is very slow to launch. Performance is very slow. Can you do fix it (performance or encoding error)?
//aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
//replace below 4 lines
CStringA sChar;
StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
_tcscpy(aBuf, sChar);
aBuf_length = _tcslen(aBuf);
Execute following script and upload a screenshot:
tip= ( 이 번역기는 완벽한 번역기가 아닙니다.
Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
)
gosub,Tray_Init
msgbox,% "IsUnicode:t" (A_IsUnicode?1:0) "
nVersion:t
t" A_AhkVersion "`n" tip
exitapp
Tray_Init:
Menu, Tray, NoStandard
Menu, Tray, DeleteAll
Menu, tray, add, 사이트 방문 , 사이트방문
return
사이트방문:
run, IEXPLORE.EXE "http://translate.google.com"
return
Here are screenshots { Running it(test.ahk) with latest ANSI version of Autohotkey(Autohotkey.exe) without compiling And Running test.exe (compiled script) }
Environment OS : Windows 7 Professional K - 64bit Laguage : Korean
I see but unfortunately I can't test it since on my system it does not display properly. Can you see if you can get this function to work for you: https://github.com/HotKeyIt/ahkdll/blob/master/source/script.cpp#L5186
I think I got it now, can you try again and confirm if it is working now ;)
Latest Autohotkey_H is still not working. Same as follow screenshot https://s17.postimg.org/tgs71s2xb/latest_ansi_autohotkey_H_test.jpg
Do you get correct result here? tip= ( 이 번역기는 완벽한 번역기가 아닙니다.
Hello? 안녕하세요 (TODO: 다른 팁도 추가하기) ) VarSetCapacity(var,1024) DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)
MsgBox % StrGet(&var,"CP0")
Here is test of screenshot. It's same result. https://s18.postimg.org/77jdaa1hl/ahk_h_test.jpg
Run this script in Unicode version (not compiled) and confirm if you get the desired result? tip= ( 이 번역기는 완벽한 번역기가 아닙니다.
Hello? 안녕하세요 (TODO: 다른 팁도 추가하기) ) VarSetCapacity(var,1024) DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)
MsgBox % StrGet(&var,"CP0")
Run script in Ansi and Unicode version of ahk_h (not compiled) get desired results. But only executing exe file(compiled script) in unicode version of ahk_h gets desired result.
Compiled script and not compiled script get desired results in all version(Ansi and Unicode) of ahk_L
Running it(test.ahk) with latest ANSI version of Autohotkey_H(Autohotkey.exe) without compiling is worked fine. But Running test_H.exe (compiled script) is not worked. ( Korean language expressed '?' )
Check if it is working now.
;test.ahk
tip=
(
이 번역기는 완벽한 번역기가 아닙니다.
Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
)
VarSetCapacity(var,1024)
DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)
MsgBox % "IsUnicode: t" (A_IsUnicode?1:0) " nVersion: t t" A_AhkVersion "`n" StrGet(&var,"CP0")
Here is a test of screenshot on 2016-09-09 https://s10.postimg.org/9228fehzd/Test160909.jpg
The message is changed (The number of question marks and the order was changed). But still Korean Language represented by the question mark '?'
I modified Script::GetLine functions in existing C++ code. (in script.cpp) https://github.com/HotKeyIt/ahkdll/blob/master/source/script.cpp#L5320
//aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
//It replaced the next four lines.
CStringA sChar;
StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
_tcscpy(aBuf, sChar);
aBuf_length = _tcslen(aBuf);
And then it works well. (Korean Language is represented correctly in ansi version of autohotkey_h.) But, a compiled script(likes over 10000 lines of script) is very slow to launch. Performance is very slow. Can you review this?
We cannot use StringUTF8ToChar due to security. What is your Codepage? MsgBox % DllCall("GetACP")
;test.ahk
tip=
(
이 번역기는 완벽한 번역기가 아닙니다.
Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
)
VarSetCapacity(var,1024)
DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)
MsgBox % "GetACP : " DllCall("GetACP") "`nIsUnicode: `t" (A_IsUnicode?1:0) " `nVersion: `t `t" A_AhkVersion "`n" StrGet(&var,"CP0")
ACP : 949 Here is test of screenshot https://s12.postimg.org/hsqii0pzx/acpconfirm.jpg
I have finally got my head around that and it should be working properly now ;)
Thank you for your hard work! Keep up the gook work :D
Now Compiling and Running is ok. But the encoding errors still occur as follow
https://s8.postimg.org/ae109ghlh/encoding_error.jpg
Could you compile this ahk source code (test.ahk)? This ahk script occurs encoding error when it was compiled using ansi version.
test.ahk