IronLanguages / main

Work for this repo has moved to https://github.com/IronLanguages/ironpython2
1.16k stars 350 forks source link

'bytearray(..., encoding)' broken under IP #423

Open ironpythonbot opened 9 years ago

ironpythonbot commented 9 years ago

--------------------------------------------------------------------------


IP VERSION AFFECTED: 2.6
BUILD TYPE: All
FLAGS PASSED TO IPY.EXE: -X:Python26 -X:ExceptionDetails
OPERATING SYSTEM: 32-bit Vista
CLR VERSION: .NET 2.0 SP1
SCENARIOS AFFECTED: Not CPython 2.6 compatibile

--------------------------------------------------------------------------


NOTES
In the repro below:


REPRODUCTION SNIPPET:

Cpy 2.6

sample = u"Hello world\n\u1234\u5678\u9abc\udef0"
bytearray(sample, "utf8")
bytearray(b'Hello world\n\xe1\x88\xb4\xe5\x99\xb8\xe9\xaa\xbc\xed\xbb\xb0')

CPy 3.0

sample = "Hello world\n\u1234\u5678\u9abc\udef0"
bytearray(sample, "utf8")
bytearray(b'Hello world\n\xe1\x88\xb4\xe5\x99\xb8\xe9\xaa\xbc\xed\xbb\xb0')

IP 2.6

sample = u"Hello world\n\u1234\u5678\u9abc\udef0"
bytearray(sample, "utf8")
Unable to translate Unicode character \uDEF0 at index 15 to specified code page.
at System.Text.EncoderExceptionFallbackBuffer.Fallback(Char charUnknown, Int32 index)
at System.Text.EncoderFallbackBuffer.InternalFallback(Char ch, Char & chars)
at System.Text.UTF8Encoding.GetByteCount(Char
chars, Int32 count, EncoderNLS baseEncoder)
at System.Text.UTF8Encoding.GetByteCount(Char[] chars, Int32 index, Int32 count)
at IronPython.Runtime.Operations.StringOps.EncodingWrapper.GetByteCount(Char[] chars, Int32 index, Int32 count) in e:\vslprf\Merlin\Main\
Languages\IronPython\IronPython\Runtime\Operations\StringOps.cs:line 1767
at System.Text.Encoding.GetBytes(Char[] chars, Int32 index, Int32 count)
at System.Text.Encoding.GetBytes(String s)
at IronPython.Runtime.Operations.StringOps.RawEncode(CodeContext context, String s, Object encodingType, String errors) in e:\vslprf\Merl
in\Main\Languages\IronPython\IronPython\Runtime\Operations\StringOps.cs:line 1617
at IronPython.Runtime.Operations.StringOps.encode(CodeContext context, String s, Object encoding, String errors) in e:\vslprf\Merlin\Main
\Languages\IronPython\IronPython\Runtime\Operations\StringOps.cs:line 429
at IronPython.Runtime.ByteArray.init(CodeContext context, String unicode, String encoding) in e:\vslprf\Merlin\Main\Languages\IronPyt
hon\IronPython\Runtime\ByteArray.cs:line 53
at CallSite.Target(Closure , CallSite , CodeContext , Object , Object , String )
at System.Dynamic.UpdateDelegates.UpdateAndExecute4[T0,T1,T2,T3,TRet](CallSite site, T0 arg0, T1 arg1, T2 arg2, T3 arg3) in e:\vslprf\ndp
\fx\src\Core\Microsoft\Scripting\Actions\UpdateDelegates.Generated.cs:line 613
at $35$18.$35(Scope $scope, LanguageContext $language) in

:line 1 at Microsoft.Scripting.Runtime.OptimizedScriptCode.InvokeTarget(LambdaExpression code, Scope scope) in e:\vslprf\Merlin\Main\Runtime\Micr osoft.Scripting\Runtime\OptimizedScriptCode.cs:line 80 at Microsoft.Scripting.ScriptCode.Run(Scope scope) in e:\vslprf\Merlin\Main\Runtime\Microsoft.Scripting\Runtime\ScriptCode.cs:line 82 at IronPython.Hosting.PythonCommandLine.<>c__DisplayClass1. b__0() in e:\vslprf\Merlin\Main\Languages\IronPython\IronPy thon\Hosting\PythonCommandLine.cs:line 379 UnicodeEncodeError: ('unknown', u'\udef0', 15, 16, '') sample = "Hello world\n\u1234\u5678\u9abc\udef0" bytearray(sample, "utf8") bytearray(b'Hello world\n\u1234\u5678\u9abc\udef0') ## Work Item Details

Original CodePlex Issue: Issue 21336 Status: Active Reason Closed: Unassigned Assigned to: Unassigned Reported on: Feb 23, 2009 at 11:08 PM Reported by: dfugate Updated on: Feb 22, 2013 at 2:13 AM Updated by: jdhardy Test: (Cpy) test_bytes.py

ironpythonbot commented 9 years ago

On 2009-02-26 07:03:30 UTC, dinov commented:

This seems to be because this includes a surrogate character which we don't happily encode. We may need our own UTF8 encoding for CPython compatbility.