beeware / voc

A transpiler that converts Python code into Java bytecode
http://beeware.org/voc
BSD 3-Clause "New" or "Revised" License
869 stars 519 forks source link

test_str for some Unicode strings fails in Windows 10 #610

Open hazzashirt opened 7 years ago

hazzashirt commented 7 years ago

In Windows 10 64-bit: When running the test suite via the cmd (>setup.py test...) or via Cricket, test_str in all suites fails with the following error:

AssertionError: '>>> [321 chars]abs\n>>> x = "Mÿ hôvèrçràft îß fûłl öf éêlś"\n[319 chars]==\n' != '>>> [321 chars]abs\n<class \'UnicodeEncodeError\'> : \'charma[341 chars]==\n'

More details: tests.builtins.test_any.BuiltinAnyFunctionTests.test_str


Traceback (most recent call last):

  File "C:\python34\Lib\unittest\case.py", line 57, in testPartExecutor
    yield

  File "C:\python34\Lib\unittest\case.py", line 574, in run
    testMethod()

  File "c:\Users\Harry\pybee\voc-dev\voc\tests\utils.py", line 1037, in func
    substitutions=getattr(self, 'substitutions', SAMPLE_SUBSTITUTIONS)

  File "c:\Users\Harry\pybee\voc-dev\voc\tests\utils.py", line 1075, in assertBuiltinFunction
    run_in_function=False,

  File "c:\Users\Harry\pybee\voc-dev\voc\tests\utils.py", line 399, in assertCodeExecution
    self.assertEqual(java_out, py_out, context)

  File "C:\python34\Lib\unittest\case.py", line 794, in assertEqual
    assertion_func(first, second, msg=msg)

  File "C:\python34\Lib\unittest\case.py", line 1167, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))

  File "C:\python34\Lib\unittest\case.py", line 639, in fail
    raise self.failureException(msg)

AssertionError: '>>> [157 chars]any\n>>> x = "Mÿ hôvèrçràft îß fûłl öf éêlś"\n[154 chars]==\n' != '>>> [157 chars]any\n<class \'UnicodeEncodeError\'> : \'charma[231 chars]==\n'
  >>> f = any
  >>> x = ""
  >>> f(x)
  False

  >>> f = any
  >>> x = "3"
  >>> f(x)
  True

  >>> f = any
  >>> x = "This is another string"
  >>> f(x)
  True

  >>> f = any
+ <class 'UnicodeEncodeError'> : 'charmap' codec can't encode character '\u0142' in position 28: character maps to <undefined>
- >>> x = "Mÿ hôvèrçràft îß fûłl öf éêlś"
- >>> f(x)
- True

  >>> f = any
  >>> x = "One arg: %s"
  >>> f(x)
  True

  >>> f = any
  >>> x = "Three args: %s | %s | %s"
  >>> f(x)
  True

  ===end of test===
 : Global context: Error running f(x)
pwillcode commented 7 years ago

The problem here appears to be rooted in the environment. Specifically, the codepage of the console (terminal) is not utf-8. I can't be 100% sure of this, since I do not have Windows 10, however see this SO answer for a similar situation. If you have Windows 10, feel free to try running chcp 65001 before running the test. If that makes the error go away, then I would feel fairly confident this is what's going on.

I have found some activity starting in August 2016 addressing this problem directly in Python: https://www.python.org/dev/peps/pep-0529/ http://bugs.python.org/issue1602 Could someone (who has seen this behavior in Windows) try updating Python to see if that resolves the error?

I looked for a way to change the parent terminal's encoding directly from python, however I saw no obvious solutions. It is possible that executing the chcp command from within python would resolve the problem, but that's a platform-dependent fix. I have no way of even testing that without Windows 10, and if there is a native Python way of getting the same effect, that would be preferable.

Note about non-Windows environments: The same type of behavior can occur on other platforms, however most use UTF-8 by default. I was able to cause a similar failure of the test in Ubuntu by creating a custom charset (CP850 in my case,) compiling a locale with it, and changing my LANG environment variable to reference the new locale/charset combo. The basic process is laid out in this SO answer, however they were making a custom locale and I was making a custom charset.