HotKeyIt / ahkdll

AutoHotkey_H
Mozilla Public License 2.0
271 stars 63 forks source link

The encoding errors still occur in Ansi Version #8

Closed dmacoder closed 8 years ago

dmacoder commented 8 years ago

Now Compiling and Running is ok. But the encoding errors still occur as follow

https://s8.postimg.org/ae109ghlh/encoding_error.jpg


test.exe

? ???? ??? ???? ????. Hello? ?????

(TODO: ?? ?? ????)

확인

It should be as follows

test.ahk

이 번역기는 완벽한 번역기가 아닙니다.

Hello? 안녕하세요

(TODO: 다른 팁도 추가하기)

확인

Could you compile this ahk source code (test.ahk)? This ahk script occurs encoding error when it was compiled using ansi version.


test.ahk

tip=
 (
이 번역기는 완벽한 번역기가 아닙니다.

Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
 )
gosub,Tray_Init
 msgbox,% tip
 exitapp
 Tray_Init:
 Menu, Tray, NoStandard
 Menu, Tray, DeleteAll
 Menu, tray, add, 사이트 방문 , 사이트방문
return
사이트방문:
run, IEXPLORE.EXE "http://translate.google.com"
return

HotKeyIt commented 8 years ago

ANSI version cannot display UNICODE characters!

dmacoder commented 8 years ago
;test.ahk
tip=
 (
이 번역기는 완벽한 번역기가 아닙니다.

Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
 )
gosub,Tray_Init
 msgbox,% tip
 exitapp
 Tray_Init:
 Menu, Tray, NoStandard
 Menu, Tray, DeleteAll
 Menu, tray, add, 사이트 방문 , 사이트방문
return
사이트방문:
run, IEXPLORE.EXE "http://translate.google.com"
return

But this script(Korean Language) is fine in Ansi Version before Commits on Jun 13, 2016 - Improve compiled code protection. I modify GetLine functions in existing C++ code (in script.cpp) And then it works. But, a compiled script(likes over 10000 lines of script) is very slow to launch. Performance is very slow. Can you do fix it?


//aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
                //replace below 4 lines
                CStringA sChar;
                StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
                _tcscpy(aBuf, sChar);               
                aBuf_length = _tcslen(aBuf);

size_t Script::GetLine(LPTSTR aBuf, int aMaxCharsToRead, int aInContinuationSection, TextStream *ts)
{
    size_t aBuf_length = 0;

    if (!aBuf || !ts) return -1;
    if (aMaxCharsToRead < 1) return 0;
    if (  !(aBuf_length = ts->ReadLine(aBuf, aMaxCharsToRead))  ) // end-of-file or error
    {
        *aBuf = '\0';  // Reset since on error, contents added by fgets() are indeterminate.
        return -1;
    }
    if (aBuf[aBuf_length-1] == '\n')
        --aBuf_length;
    aBuf[aBuf_length] = '\0';
    if (g_hResource)
    {
        DWORD aSizeEncrypted = LINE_SIZE * sizeof(TCHAR);
        BYTE *data = (BYTE*)malloc(LINE_SIZE * sizeof(TCHAR));
        g_CryptStringToBinary(aBuf, NULL, CRYPT_STRING_BASE64, data, &aSizeEncrypted, NULL, NULL);
        LPVOID aDataBuf;
        if (*(unsigned int*)data == 0x04034b50)
        {
            if (aSizeEncrypted = DecompressBuffer(data, aDataBuf, aSizeEncrypted, g_default_pwd))
            {
#ifdef _UNICODE
                aBuf_length = UTF8ToUTF16((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
#else
                //aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
                //replace below 4 lines
                CStringA sChar;
                StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
                _tcscpy(aBuf, sChar);               
                aBuf_length = _tcslen(aBuf);
#endif
                SecureZeroMemory(aDataBuf, aSizeEncrypted);
                g_VirtualFree(aDataBuf, 0, MEM_RELEASE);
            }
            else
                return -1;
        }
        free(data);
    }

    if (aInContinuationSection)
    {
        LPTSTR cp = omit_leading_whitespace(aBuf);
        if (aInContinuationSection == CONTINUATION_SECTION_WITHOUT_COMMENTS) // By default, continuation sections don't allow comments (lines beginning with a semicolon are treated as literal text).
        {
            // Caller relies on us to detect the end of the continuation section so that trimming
            // will be done on the final line of the section and so that a comment can immediately
            // follow the closing parenthesis (on the same line).  Example:
            // (
            //  Text
            // ) ; Same line comment.
            if (*cp != ')') // This isn't the last line of the continuation section, so leave the line untrimmed (caller will apply the ltrim setting on its own).
                return aBuf_length; // Earlier sections are responsible for keeping aBufLength up-to-date with any changes to aBuf.
            //else this line starts with ')', so continue on to later section that checks for a same-line comment on its right side.
        }
        else // aInContinuationSection == CONTINUATION_SECTION_WITH_COMMENTS (i.e. comments are allowed in this continuation section).
        {
            // Fix for v1.0.46.09+: The "com" option shouldn't put "ltrim" into effect.
            if (!_tcsncmp(cp, g_CommentFlag, g_CommentFlagLength)) // Case sensitive.
            {
                *aBuf = '\0'; // Since this line is a comment, have the caller ignore it.
                return -2; // Callers tolerate -2 only when in a continuation section.  -2 indicates, "don't include this line at all, not even as a blank line to which the JOIN string (default "\n") will apply.
            }
            if (*cp == ')') // This isn't the last line of the continuation section, so leave the line untrimmed (caller will apply the ltrim setting on its own).
            {
                ltrim(aBuf); // Ltrim this line unconditionally so that caller will see that it starts with ')' without having to do extra steps.
                aBuf_length = _tcslen(aBuf); // ltrim() doesn't always return an accurate length, so do it this way.
            }
        }
    }
    // Since above didn't return, either:
    // 1) We're not in a continuation section at all, so apply ltrim() to support semicolons after tabs or
    //    other whitespace.  Seems best to rtrim also.
    // 2) CONTINUATION_SECTION_WITHOUT_COMMENTS but this line is the final line of the section.  Apply
    //    trim() and other logic further below because caller might rely on it.
    // 3) CONTINUATION_SECTION_WITH_COMMENTS (i.e. comments allowed), but this line isn't a comment (though
    //    it may start with ')' and thus be the final line of this section). In either case, need to check
    //    for same-line comments further below.
    if (aInContinuationSection != CONTINUATION_SECTION_WITH_COMMENTS) // Case #1 & #2 above.
    {
        aBuf_length = trim(aBuf);
        if (!_tcsncmp(aBuf, g_CommentFlag, g_CommentFlagLength)) // Case sensitive.
        {
            // Due to other checks, aInContinuationSection==false whenever the above condition is true.
            *aBuf = '\0';
            return 0;
        }
    }
    //else CONTINUATION_SECTION_WITH_COMMENTS (case #3 above), which due to other checking also means that
    // this line isn't a comment (though it might have a comment on its right side, which is checked below).
    // CONTINUATION_SECTION_WITHOUT_COMMENTS would already have returned higher above if this line isn't
    // the last line of the continuation section.

    // Handle comment-flags that appear to the right of a valid line.
    LPTSTR cp, prevp;
    for (cp = _tcsstr(aBuf, g_CommentFlag); cp; cp = _tcsstr(cp + g_CommentFlagLength, g_CommentFlag))
    {
        // If no whitespace to its left, it's not a valid comment.
        // We insist on this so that a semi-colon (for example) immediately after
        // a word (as semi-colons are often used) will not be considered a comment.
        prevp = cp - 1;
        if (prevp < aBuf) // should never happen because we already checked above.
        {
            *aBuf = '\0';
            return 0;
        }
        if (IS_SPACE_OR_TAB_OR_NBSP(*prevp)) // consider it to be a valid comment flag
        {
            *prevp = '\0';
            aBuf_length = rtrim_with_nbsp(aBuf, prevp - aBuf); // Since it's our responsibility to return a fully trimmed string.
            break; // Once the first valid comment-flag is found, nothing after it can matter.
        }
        else // No whitespace to the left.
            if (*prevp == g_EscapeChar) // Remove the escape char.
            {
                // The following isn't exactly correct because it prevents an include filename from ever
                // containing the literal string "`;".  This is because attempts to escape the accent via
                // "``;" are not supported.  This is documented here as a known limitation because fixing
                // it would probably break existing scripts that rely on the fact that accents do not need
                // to be escaped inside #Include.  Also, the likelihood of "`;" appearing literally in a
                // legitimate #Include file seems vanishingly small.
                tmemmove(prevp, prevp + 1, _tcslen(prevp + 1) + 1);  // +1 for the terminator.
                --aBuf_length;
                // Then continue looking for others.
            }
            // else there wasn't any whitespace to its left, so keep looking in case there's
            // another further on in the line.
    } // for()

    return aBuf_length; // The above is responsible for keeping aBufLength up-to-date with any changes to aBuf.
}
HotKeyIt commented 8 years ago

If it worked before then it was a bug, try running it with ANSI version without compiling, also with original AutoHotkey.

dmacoder commented 8 years ago

Both Running it(test.ahk) with latest ANSI version of Autohotkey_L(Autohotkey.exe) without compiling And Running test.exe (compiled script) are worked fine.

link: https://s3.postimg.org/wbcsmgsub/Latest_Autohotkey_L.jpg


Running it(test.ahk) with latest ANSI version of Autohotkey_H(Autohotkey.exe) without compiling is worked fine. But Running test_H.exe (compiled script) is not worked.

link:

https://s3.postimg.org/swpq5bn37/Latest_Autohotkey_H.jpg

test.exe

이 번역기는 완벽한 번역기가 아닙니다.

Hello? 안녕하세요

(TODO: 다른 팁도 추가하기)

확인


test_H.exe

? ???? ??? ???? ????.

Hello? ?????

(TODO: ?? ?? ????)

확인


Check out the following links https://s3.postimg.org/wbcsmgsub/Latest_Autohotkey_L.jpg https://s3.postimg.org/swpq5bn37/Latest_Autohotkey_H.jpg

HotKeyIt commented 8 years ago

Download ANSI version (https://autohotkey.com/download/ahk-a32.zip) and execute your script with it, you get:

? ???? ??? ???? ????.

Hello? ????? (TODO: ?? ?? ????)

Why do you think is it called ANSI?

dmacoder commented 8 years ago

I Download ANSI version (https://autohotkey.com/download/ahk-a32.zip) and execute script with it. It works well. You can check the results at the following link:

https://s14.postimg.org/5ra2bn5xt/Autohotkey112401_ansi_test.jpg

I get: 이 번역기는 완벽한 번역기가 아닙니다.

Hello? 안녕하세요

(TODO: 다른 팁도 추가하기)

I know that Ansi version of Autohotkey_L supported Korean Language. Does not Ansi Version of Autohotkey support Korean Language?

I don't know exactly... I think Ansi is a form of MultiByte Character Set (MBCS) called Double-Byte Character Set (DBCS).

Under MBCS, characters are encoded in either 1 or 2 bytes. In 2-byte characters, the first, or lead byte, signals that both it and the following byte are to be interpreted as one character. The first byte comes from a range of codes reserved for use as lead bytes. Which ranges of bytes can be lead bytes depends on the code page in use. For example, Japanese code page 932 uses the range 0x81 through 0x9F as lead bytes, but Korean code page 949 uses a different range.

By definition, the ASCII character set is a subset of all multibyte-character sets. In many multibyte character sets, each character in the range 0x00 – 0x7F is identical to the character that has the same value in the ASCII character set. For example, in both ASCII and MBCS character strings, the 1-byte NULL character ('\0') has value 0x00 and indicates the terminating null character.

ASCII character set: the ASCII character set is only 256 characters, represented by 0-255 numbers. Including the size of the letters, numbers and special characters; such as punctuation, currency symbols, etc.. For most Latin languages, these characters have been enough. However, many of the characters used by many Asian and Oriental languages are far more than 256 characters.

ANSI encoding and MBCS (multi byte encoding)ANSI (American National Standards Institute), that is, every country (non Latin speaking countries) to develop their own text of the encoding rules, and get ANSI approval, ANSI standards, the world in the same time that the corresponding national language is called ANSI encoding. As for the simplified Chinese encoding GB2312, in fact it is a code page of ANSI 936.

For platforms used in markets whose languages use large character sets, the best alternative to Unicode is MBCS.

I can see that in the follow link :

  1. https://msdn.microsoft.com/en-us/library/ey142t48.aspx
  2. https://msdn.microsoft.com/en-us/library/5z097dxa.aspx
  3. https://msdn.microsoft.com/en-us/library/cwe8bzh0.aspx
  4. http://prog3.com/sbdm/blog/luoyouren/article/details/46389411

Anyway I have to use Ansi version of autohotkey. Because some scripts written for AutoHotkey_B will function correctly on the ANSI version of AutoHotkey_L but fail on Unicode versions.

And I have Ansi Version of DataBase as follows.

[DB - Ansi ver] Ansi \u00BD\u00C3\u00B0\u00A3 \u00BF\u00AA\u00BC\u00F8\u00C0\u00D4\u00C0\u00E5 [DB Example - Unicode ver] Unicode \uC2DC\uAC04 Unicode

\uC5ED\uC21C\uC785\uC7A5

I don't have Unicode version of DataBase.

My script(Korean Language) is fine in Ansi Version of Autohotkey_L. But not the Latest AutoHotkey_H. I modify GetLine functions in existing C++ code (in script.cpp) And then it works. But, a compiled script(likes over 10000 lines of script) is very slow to launch. Performance is very slow. Can you do fix it (performance or encoding error)?


//aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
                //replace below 4 lines
                CStringA sChar;
                StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
                _tcscpy(aBuf, sChar);               
                aBuf_length = _tcslen(aBuf);
HotKeyIt commented 8 years ago

Execute following script and upload a screenshot:

tip= ( 이 번역기는 완벽한 번역기가 아닙니다.

Hello? 안녕하세요 (TODO: 다른 팁도 추가하기) ) gosub,Tray_Init msgbox,% "IsUnicode:t" (A_IsUnicode?1:0) "nVersion:tt" A_AhkVersion "`n" tip exitapp Tray_Init: Menu, Tray, NoStandard Menu, Tray, DeleteAll Menu, tray, add, 사이트 방문 , 사이트방문 return 사이트방문: run, IEXPLORE.EXE "http://translate.google.com" return

dmacoder commented 8 years ago

Here are screenshots { Running it(test.ahk) with latest ANSI version of Autohotkey(Autohotkey.exe) without compiling And Running test.exe (compiled script) }

  1. Latest Autohotkey_H https://s17.postimg.org/tgs71s2xb/latest_ansi_autohotkey_H_test.jpg
  2. Latest Autohotkey_L https://s21.postimg.org/f24gh66nb/latest_ansi_autohotkey_L_test.jpg

Environment OS : Windows 7 Professional K - 64bit Laguage : Korean

HotKeyIt commented 8 years ago

I see but unfortunately I can't test it since on my system it does not display properly. Can you see if you can get this function to work for you: https://github.com/HotKeyIt/ahkdll/blob/master/source/script.cpp#L5186

HotKeyIt commented 8 years ago

I think I got it now, can you try again and confirm if it is working now ;)

dmacoder commented 8 years ago

Latest Autohotkey_H is still not working. Same as follow screenshot https://s17.postimg.org/tgs71s2xb/latest_ansi_autohotkey_H_test.jpg

HotKeyIt commented 8 years ago

Do you get correct result here? tip= ( 이 번역기는 완벽한 번역기가 아닙니다.

Hello? 안녕하세요 (TODO: 다른 팁도 추가하기) ) VarSetCapacity(var,1024) DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)

MsgBox % StrGet(&var,"CP0")

dmacoder commented 8 years ago

Here is test of screenshot. It's same result. https://s18.postimg.org/77jdaa1hl/ahk_h_test.jpg

HotKeyIt commented 8 years ago

Run this script in Unicode version (not compiled) and confirm if you get the desired result? tip= ( 이 번역기는 완벽한 번역기가 아닙니다.

Hello? 안녕하세요 (TODO: 다른 팁도 추가하기) ) VarSetCapacity(var,1024) DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)

MsgBox % StrGet(&var,"CP0")

dmacoder commented 8 years ago

Run script in Ansi and Unicode version of ahk_h (not compiled) get desired results. But only executing exe file(compiled script) in unicode version of ahk_h gets desired result.

Compiled script and not compiled script get desired results in all version(Ansi and Unicode) of ahk_L


Running it(test.ahk) with latest ANSI version of Autohotkey_H(Autohotkey.exe) without compiling is worked fine. But Running test_H.exe (compiled script) is not worked. ( Korean language expressed '?' )

HotKeyIt commented 8 years ago

Check if it is working now.

dmacoder commented 8 years ago
;test.ahk
tip=
 (
이 번역기는 완벽한 번역기가 아닙니다.

Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
 )
VarSetCapacity(var,1024)
 DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)

MsgBox % "IsUnicode: t" (A_IsUnicode?1:0) " nVersion: t t" A_AhkVersion "`n" StrGet(&var,"CP0")

Here is a test of screenshot on 2016-09-09 https://s10.postimg.org/9228fehzd/Test160909.jpg

The message is changed (The number of question marks and the order was changed). But still Korean Language represented by the question mark '?'

dmacoder commented 8 years ago

I modified Script::GetLine functions in existing C++ code. (in script.cpp) https://github.com/HotKeyIt/ahkdll/blob/master/source/script.cpp#L5320

//aBuf_length = UTF8ToASCII((unsigned char*)aBuf, aMaxCharsToRead, (unsigned char*)aDataBuf, aSizeEncrypted) - 1;
//It replaced the next four lines.
                CStringA sChar;
                StringUTF8ToChar((LPCSTR)aDataBuf, sChar, -1, NULL, CP_ACP);
                _tcscpy(aBuf, sChar);               
                aBuf_length = _tcslen(aBuf);

And then it works well. (Korean Language is represented correctly in ansi version of autohotkey_h.) But, a compiled script(likes over 10000 lines of script) is very slow to launch. Performance is very slow. Can you review this?

HotKeyIt commented 8 years ago

We cannot use StringUTF8ToChar due to security. What is your Codepage? MsgBox % DllCall("GetACP")

dmacoder commented 8 years ago
;test.ahk
tip=
 (
이 번역기는 완벽한 번역기가 아닙니다.

Hello?
안녕하세요
(TODO: 다른 팁도 추가하기)
 )
VarSetCapacity(var,1024)
 DllCall("WideCharToMultiByte","UInt",949,"Uint",0,"wStr",tip,"UInt",StrLen(tip)*2,"PTR",&var,"UInt",1024,"UInt",0,"UInt",0)

MsgBox % "GetACP : " DllCall("GetACP") "`nIsUnicode: `t" (A_IsUnicode?1:0) " `nVersion: `t `t" A_AhkVersion "`n" StrGet(&var,"CP0")

ACP : 949 Here is test of screenshot https://s12.postimg.org/hsqii0pzx/acpconfirm.jpg

HotKeyIt commented 8 years ago

I have finally got my head around that and it should be working properly now ;)

dmacoder commented 8 years ago

Thank you for your hard work! Keep up the gook work :D