Closed GoogleCodeExporter closed 8 years ago
[deleted comment]
Sorry i should have written Label:OpSys-Windows maybe previous
Original comment by gianni...@gmail.com
on 20 Jan 2009 at 9:42
Thanks for the report. There are indeed problems on Windows when the user's
name or
other path components contain non-ascii characters. I can reproduce the issue
here
(my normal username on a Windows computer has a non-ascii character in it).
If you want to run Crunchy on Windows, the only option at this point would be
to make
sure you install it in a path that only contains ascii characters. If you
could do
this (even if only as a test) and report the result, it would be much
appreciated -
especially since I have not installed Python 2.6 yet.
As for the connection to skyhookwireless, I have NO idea. It should NEVER
connect
anywhere without user intervention. I can certainly understand why you would be
upset
seeing this...
How is it that you notice that Crunchy tries to connect to skyhookwireless?
Are you sure you didn't have something else running in your browser at the same
time?
Otherwise, the only (far-fetched?) explanation I can think of is that, somehow,
localhost (127.0.0.1) is mapped to this address on your system due to some
malware
already present. I obviously can't reproduce the bug here.
Original comment by andre.ro...@gmail.com
on 20 Jan 2009 at 10:24
Hi andre roberge:
First of all i would like to admit that i was wrong about crunchy tried to
connect to
a domain.It was me before some time that i 've changed my hosts file for
security
reasons but i found that i have mapped by mistake 127.0.01 to the www.sk***
domain.
I am really sorry for the misunderstanding .
About the crunchy, yes i did check it if it works .I moved the crunchy
directories
to the c:\ level and however i found out another "problem" .
More precisely when i clicked the link to python.org from crunchy i got (i tried
seceral times) the following errors:
404 NOT FOUND: /www.python.org
Auê! (oops!) The page you are looking for (/www.python.org) is no longer here
or has
been moved!
That page had this as a title --> it's all lost and stoof
Original comment by gianni...@gmail.com
on 22 Jan 2009 at 7:32
I will enter the 404 as a separate issue; thanks for the bug report.
The UnicodeDecodeError bug remains :-(
Original comment by andre.ro...@gmail.com
on 22 Jan 2009 at 11:35
Adding labels.
Original comment by andre.ro...@gmail.com
on 22 Jan 2009 at 11:38
Hi again! :-) I think the following might help you :-)
First I starded my cmd from a path with greek and rare-used letters and spaces
in order to get a full "buggy" string...
The original buggy path is:
C:\Documents and Settings\user\Επιφάνεια
εργασίας\%τέστ_όϊ(#2011
SO i typed to python interpreter the folowing:
import locale,os,sys
#now i check the encodings on the system
print locale.getdefaultlocale()
# nice we have one tuple of the user locale :-)
#i get ('el_GR', 'cp1253')
print locale.getpreferredencoding()
# nice i get the preferred encoding
# i get 'cp1253'
print sys.stdout.encoding
# nice this is console's default encoding
# i get 'cp737' which it is 100% ok because i typed chcp and i got 737 code
page :-)
print sys.getfilesystemencoding
# now the mbcs string
#I get 'mbcs'
os.getcwd()
#I printed the current woking directory
# i got
'C:\\Documents and Settings\\user\\\xc5\xf0\xe9\xf6\xdc\xed\xe5\xe9\xe1
\xe5\xf1\xe3\xe1\xf3\xdf\xe1\xf2\\%\xf4\xdd\xf3\xf4_\xfc\xfa(#2011'
#hmm it looks inappropriate
#after i tried to decode the "buggy" string
#it returned a unicode string
#This time using cp737
os.getcwd().decode('cp737')
# i got
u'C:\\Documents and
Settings\\user\\\u253c\u038f\u03ce\xf7\u2584\u038a\u03af\u03ce\u03ac
\u03af\xb1\u03ae\u03ac\u2264\u2580\u03ac\u2265\\%\u03aa\u258c\u2
264\u03aa_\u207f\xb7(#2011'
#then i printed it and i got
C:\Documents and Settings\user\┼Ώώ÷▄Ίίώά
ί±ήά≤▀ά≥\%Ϊ▌≤Ϊ_ⁿ·(#2011
# hmmm first attempt rejected!!!
# a second time with mbcs
os.getcwd().decode('mbcs')
# and i got this unicode string
u'C:\\Documents and Settings\\user\\\u0395\u03c0\u03b9\u03c6\u03ac\u03bd\u03b5
\u03b9\u03b1 \u03b5\u03c1\u03b3\u03b1\u03c3\u03af\u03b1\u03c2\\%\u03c4\u03a
d\u03c3\u03c4_\u03cc\u03ca(#2011'
# then i printed this one too and i got
C:\Documents and Settings\user\Επιφάνεια
εργασίας\%τέστ_όϊ(#2011
#Bingo !! it's ok
#Finally i tried the cp1253 encoding to be "sure" :-)
os.getcwd().decode('cp1253')
#and i got this unicode string
u'C:\\Documents and Settings\\user\\\u0395\u03c0\u03b9\u03c6\u03ac\u03bd
\u03b5\u03b9\u03b1
\u03b5\u03c1\u03b3\u03b1\u03c3\u03af\u03b1\u03c2\\%\u03c4\u03a
d\u03c3\u03c4_\u03cc\u03ca(#2011'
# and i printed that too and i got the right again
C:\Documents and Settings\user\Επιφάνεια
εργασίας\%τέστ_όϊ(#2011
#Bingo again
# I see that the mbcs (or better MBCS) mappings to unicode are equal to
#my computer at least for greek letters and although i did'nt check the CJK
#character sets about East Asian languages i am sure there must be a similar
#treatment
#About how not to raise a UnicodeDecodeError i think that
#using unicode by default internally and then checking the system locale
# and splitting out the right encoding acording the system's language should be
a
#good way to get a solution.
#This is exactly what i "inherited" from a page at a site
#The link to that page is :
#http://www.amk.ca/python/howto/unicode
#As of that page suggests or maybe enforces
"""
Software should only work with Unicode strings internally, converting to a
particular encoding on output.
If you attempt to write processing functions that accept both Unicode and 8-bit
strings, you will find your program vulnerable to bugs wherever you combine the
two
different kinds of strings. Python's default encoding is ASCII, so whenever a
character with an ASCII value >127 is in the input data, you'll get a
UnicodeDecodeError because that character can't be handled by the ASCII
encoding. """
#I didn't solve the UnicodeDecodeError problem but at least i think i now
#understand quite enough about it and in a matter that what i wrote ,can be
helpful
#to the readers or maybe to the project it shelf .
Original comment by gianni...@gmail.com
on 22 Jan 2009 at 9:17
sorry about MBCS i wanted to write DBCS (Double Byte Character Set)
Original comment by gianni...@gmail.com
on 22 Jan 2009 at 9:23
Wasn't that helpful? I am curious
Original comment by gianni...@gmail.com
on 24 Jan 2009 at 6:05
Sorry - I did not have time to investigate; I'm kind of swamped with other stuff
right now. I will be posting an update here when I have the time.
Original comment by andre.ro...@gmail.com
on 24 Jan 2009 at 10:11
It's ok i understand you.
Original comment by gianni...@gmail.com
on 25 Jan 2009 at 12:51
Hello bug reporter!
If you
(1) Run the Python interpreter; and
(2) Execute open(u"C:/Documents and Settings/Administrator/Επιφάνεια
εργασίας/[any file you have lying around]"),
does an exception show up? I'm porting Crunchy to Python 3 and I'm hoping that
Python
2.6 on Windows locales does the right thing when the path is Unicode. If not,
I'll
put in the locale workaround that you suggested.
File paths are generally a pain since a lot of environments don't have the
right
locales set up for Python to figure out what to do (even too incorrect for the
locale
module to return the right value). This might have to eventually be a
configuration
option where the user inputs his locale in as a user-friendly manner as he can.
Original comment by shadytr...@gmail.com
on 10 Jul 2009 at 10:12
More research on this: On Windows, when open() receives a bytestring, it will
try to
convert it into Unicode. open() calls file_init in fileobject.c [1]. With a
bytestring, wideargument is then always 0 and therefore the arguments get
parsed as
"et|si:file" or a bytestring that's encoded as Py_FileSystemDefaultEncoding
(sys.getfilesystemencoding()), which is always mbcs on Windows. That's fine;
since we
encoded the string as mbcs with getfilesystemencoding, decoding it as mbcs
shouldn't
be a problem. But on your machine, the ascii codec somehow gets called to
decode the
bytestring, and I'm completely baffled as to how that's happening since
getfilesystemencoding() is returning mbcs for you.
[1]: http://svn.python.org/view/python/trunk/Objects/fileobject.c?
revision=73686&view=markup
[2]: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?
revision=73776&view=markup
For André's branch and probably the 1.0 release, path_to_filedata is passing
Unicode
to open(), which bypasses the encoding altogether since Windows natively uses
Unicode
for its filesystem. On other systems, Unicode paths get encoded down to
nl_langinfo(CODESET) [3] as expected. This is the option we took for Python 2,
and
it's the only option available for Python 3. In light of this, I'm closing this
bug
(as Fixed, since there doesn't seem be a more nuanced, correct status) since
it's
been obsoleted by these changes.
[3]: http://svn.python.org/view/python/trunk/Python/pythonrun.c?
revision=71152&view=markup
Original comment by shadytr...@gmail.com
on 13 Aug 2009 at 1:02
Original issue reported on code.google.com by
gianni...@gmail.com
on 20 Jan 2009 at 9:37