davidm / luacom

Microsoft Component Object Model (COM) binding for Lua
http://lua-users.org/wiki/LuaCom
Other
116 stars 51 forks source link

correct bstr conversion #6

Closed windtail closed 11 years ago

windtail commented 11 years ago

hi davidm,

When i use luacom, i found that i cannot used it with Chinese characters: either filename with Chinese characters nor string parameters with Chinese characters.

i tried to save the lua source file to UTF-8 format, but that could not solve the filename with Chinese characters, because i cannot change the system's encoding.

Later, i found it works with Cygwin, then i realized that the newest Cygwin internally convert filename to UTF-8.

i download the source code today, and i think in function bstr2string()/string2bstr() CP_UTF8 should be changed to CP_ACP when not using with Cygwin (see commit changes). The changes work for me, but i do not known if it will cause other errors.

luojiejun

ignacio commented 11 years ago

Can you provide an example of non-working code? As it seems, this change will fix LuaCom for you but will break it for every other user running it with Cygwin.

windtail commented 11 years ago

i do add a "#ifdef _CYGWIN " to keep the behavior in Cygwin. Here is an example that not work:

require "luacom"

wordApp = luacom.CreateObject("Word.Application")
wordApp.Visible = true

wordDoc = wordApp.Documents:Add()
wordApp.Selection:TypeText("中文真的可以吗,我也不知道啊!")
wordDoc:SaveAs2("F:\\中文的文件名哦还挺长的.docx")

wordDoc:Close(0)
wordApp:Quit(0)

Save the above code as ASCII format(that GBK encoding for me), it will produce a file "F:\ĵļŶͦ.docx" and its content is "ĿҲ֪",the code is tested on Windows XP(Simplified Chinese Edition)

ignacio commented 11 years ago

Well, the thing is LuaCom expects its input strings to be encoded as utf-8. So you need to change the encoding of your script and not change the codepage used by LuaCom in its conversion routines.

With LuaCom as it is, I can work with strings in different languages (spanish and portuguese, with accented characters and so on) regardless of what language I have configured in Windows. If LuaCom used CP_ACP, my spanish scripts woud only work if I run them while having Spanish as the current language.

windtail commented 11 years ago

i am not quite understand codepage of Windows, but let me explain my situation. i am in China, mose of us are using Windows xp Simplified Chinese edition, the codepage is 936, our filenames are encoded in gb2312, then how to deal with these files ?

if the filename encoding is changed to utf-8, it turns out to be a mess, this kind of filename cannot be managed by windows explorer any more.

as i mentioned, if i save lua source file as utf-8 format, luacom could understand the filenames and strings, and successfully convert the string to widechar, but others cannot, i.e. MS Word would complain "file not exists", cause the file with that UTF-8 encoded name DOES not exists(they are encoded in gb2312).

in my situation, what's worse, the content of MS Word file is encoded in gb2312, i have to deal with these files using luacom, and i find i should let luacom to use CP_ACP.

Any suggestions of using the existing version of luacom ? and i wonder what's your situation in a Spanish version of Windows ?

ignacio commented 11 years ago

Ah, I see. I think your main problem has to do with filenames. I heard that's a tricky thing on Windows. I had to do something like you did (use ACP instead of utf8) but to cope with a COM component that was improperly converting from utf-16 strings.

What I ended up doing was adding a couple of functions to allow changing the codepage on the fly. I never sent those changes upstream because I wasn't happy with that hack.

I see your situation is completely different with mine. I didn't have to deal with filenames and I don't fully understand all the tiny little details of codepages on Windows.

windtail commented 11 years ago

Thank you for your reply! May be i should close the pull request, it seems not a common problem.