Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

Win32: support full unicode in filenames (use Wide-system calls) #9578

Open p5pRT opened 15 years ago

p5pRT commented 15 years ago

Migrated from rt.perl.org#60888 (status was 'open')

Searchable as RT60888$

p5pRT commented 15 years ago

From abraendle@gmx.de

OS​: Windows XP (German); cp1252

There are 2 problems with encoding of filenames on windows​: 1) cp1252 != latin1\, but perl treats them as the same​:   for example filenames returned by readdir (cp1252) are silently interpreted as latin1\,   but the Euro sign for example is different\, the result is wrong/unuseable filename in this case.

Note​: the error may be invisible if the function that uses the filename again silently uses the inverse conversion. However if i use the filename somewhere else (print to utf8 text file\, use direct Win32 Api call\, ...)\, it is wrong.

2) Unicode chars are not possible

Since perl supports utf8 strings internally\, the filenames should be correct utf8 strings (for opendir\, open\, stat\, readdir\, -d\, -e\, etc...). Currently this is not so. WinAnsi cp1252 byte strings are interpreted as latin1 (and the other way around)\, with above problem.

NTFS supports unicode filenames\, and winapi has "Wide-system calls" (suffix W\, e.g. CreateFileW\, FindFilesW)

So\, perl should switch to use these Wide-system calls (only a UCS2 \<=> utf8 conversion remains to be done)\, both problems above would be solved ...

[Active Perl 5.8.8\, 5.10.0]



Flags​:   category=core   severity=medium

p5pRT commented 15 years ago

abraendle@gmx.de - Status changed from 'new' to 'open'

p5pRT commented 15 years ago

abraendle@gmx.de - Status changed from 'open' to 'new'

p5pRT commented 15 years ago

From abraendle@gmx.de

 
* cp1252 aka windows-1252 is the default (8-bit) charset in Windows XP here\,   (Wikipedia​: also in "English and some other Western languages")   It is similar to latin1/9 but not exactly the same.

* another advantage of full unicode support for filenames is max filename length   of 32000 chars (instead of "#define MAX_PATH 255" for winansi system calls)\, see Windows Api Documentation​: (e.g. FindFirstFile​: http​://msdn.microsoft.com/en-us/library/aa364418.aspx)

p5pRT commented 15 years ago

abraendle@gmx.de - Status changed from 'new' to 'open'