amol- / dukpy

Simple JavaScript interpreter for Python
MIT License
479 stars 43 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte #75

Closed paulocoutinhox closed 4 months ago

paulocoutinhox commented 10 months ago

Hi,

Im getting this error on render.com:

Traceback (most recent call last):
Nov 17 03:33:06 AM    File "kaktos.py", line 6, in <module>
Nov 17 03:33:06 AM      system.process_command()
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/system.py", line 122, in process_command
Nov 17 03:33:06 AM      run(command_params)
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/commands/build.py", line 12, in run
Nov 17 03:33:06 AM      system.build_pages()
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/system.py", line 85, in build_pages
Nov 17 03:33:06 AM      assets.build_js()
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/assets.py", line 44, in build_js
Nov 17 03:33:06 AM      b.write(minify_js(og.read()))
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/assets.py", line 21, in minify_js
Nov 17 03:33:06 AM      result = str(es5(babel_compile(str(code))["code"]))
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/babel.py", line 13, in babel_compile
Nov 17 03:33:06 AM      return evaljs(
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 138, in evaljs
Nov 17 03:33:06 AM      return JSInterpreter().evaljs(code, **kwargs)
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 31, in __init__
Nov 17 03:33:06 AM      self._init_process()
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 87, in _init_process
Nov 17 03:33:06 AM      self.evaljs("process = {}; process.env = dukpy.environ", environ=dict(os.environ))
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 61, in evaljs
Nov 17 03:33:06 AM      return json.loads(res.decode('utf-8'))
Nov 17 03:33:06 AM  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 6510: invalid continuation byte

Do you know what can be wrong?

amol- commented 7 months ago

As it seems it has not occurred again, I'll be closing this one unless someone has a JS snippet that can reproduce the issue

robinvandernoord commented 4 months ago

I get the same error when running the example:

python3.11 -m venv venv
. venv/bin/activate
pip install dukpy
python
import dukpy
dukpy.typescript_compile("console.log('hi')") # or any other code
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/tsc.py", line 11, in typescript_compile
    return evaljs(
           ^^^^^^^
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 138, in evaljs
    return JSInterpreter().evaljs(code, **kwargs)
           ^^^^^^^^^^^^^^^
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 31, in __init__
    self._init_process()
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 87, in _init_process
    self.evaljs("process = {}; process.env = dukpy.environ", environ=dict(os.environ))
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 61, in evaljs
    return json.loads(res.decode('utf-8'))
                      ^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 3095: invalid continuation byte

Using the cli gives the same error

amol- commented 4 months ago

I'm unable to reproduce the issues locally, is there anything specific to the system that might be influencing the encoding? Maybe the system locale isn't utf-8 or something like that? (even though that shouldn't matter)

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dukpy
>>> dukpy.typescript_compile("console.log('hi')") # or any other code
"System.register([], function(exports_1) {\n    return {\n        setters:[],\n        execute: function() {\n            console.log('hi');\n        }\n    }\n});\n"
>>> 
robinvandernoord commented 4 months ago

I've tried it on another machine and it works there. Both machines are running linux mint (based on ubuntu 22.04) with $LC_NAME = nl_NL.UTF-8.

I downloaded the repo and think I found out what is going wrong: self.evaljs("process = {}; process.env = dukpy.environ", environ=dict(os.environ))

os.environ contains PS1. My Bash prompt (PS1) is pretty customized and contains an emoji on my desktop (to indicate which machine I'm on when running multiple shells over ssh), which should be valid UTF-8 but seems to be the cause of this issue anyway.

The emoji is represented in the res variable as \xed\xa0\xbc\xed\xbf\xa0, but when I convert it to utf-8 bytes myself, it is \xf0\x9f\x8f\xa0.

I see there's a test function:

    def test_unicode(self):
        s = dukpy.evaljs("dukpy.c + 'A'", c="華")
        assert s == '華A'

If you change the unicode character 華 to something like 🏠, I predict you'll get the same exception. I tried to look at the code and saw some unicode/encoding logic in duktape.c but my C knowledge doesn't go nearly far enough to know what's happening in that file.

amol- commented 4 months ago

Thanks, this is helpful, I'll try to debug it as soon as I can

robinvandernoord commented 4 months ago

I have (partly) solved the issue (I think): #78 However, if you use an emoji in the code itself the error still occurs, which I can't seem to fix yet - I think this happens somewhere in eval_string.

If you want to close my PR and rather debug it yourself, I also understand of course!

amol- commented 4 months ago

@robinvandernoord @paulocoutinhox would you mind testing with https://github.com/amol-/dukpy/pull/79/files ? That might address the encoding issues.

paulocoutinhox commented 4 months ago

Can you create a new version/release?