PyCQA / isort

A Python utility / library to sort imports.
https://pycqa.github.io/isort/
MIT License
6.49k stars 580 forks source link

Chinese code causes errors #2199

Open kayhayen opened 11 months ago

kayhayen commented 11 months ago

Hello,

I am using isort as part of Nuitka's autoformat. A user contributed this test code for Nuitka, which presented interesting challenges and I think does for isort too:

from 测试 import 这里
# translate: from test import here
这里()
# here.test

from 那里 import 另一个测试

另一个测试()

from 测试.这边 import *
# not pass here
# trying to import everything from a unicode module

为什么呢()

Error output is this:

main.py:104: UserWarning: Unable to parse file C:\Users\kayha\AppData\Local\Temp\tmp3z9b_b4s due to 'charmap' codec can't encode characters in position 25-26: character maps to <undefined>
  warn(f"Unable to parse file {file_name} due to {error}")

The file is then empty.

I am calling it like this:

    check_call(
        isort_call
        + [
            "-q",  # quiet, but stdout is still garbage
            "--overwrite-in-place",  # avoid using another temp file, this is already on one.
            "--order-by-type",  # Order imports by type in addition to alphabetically
            "--multi-line=VERTICAL_HANGING_INDENT",
            "--trailing-comma",
            "--project=nuitka",  # make sure nuitka is first party package in import sorting.
            "--float-to-top",  # move imports to start
            "--thirdparty=SCons",
            filename,
        ],
        stdout=getNullOutput(),
    )

I have some complaints that are probably valid outside of the Chinese handling.

a) The file being empty due to an error in the processing is very unfriendly. I am aware that --overwrite-in-place can of course cause loss, but if an error occurred during parsing, why was the file opened for writing, it shouldn't be done.

b) Why is the exit code 0, that seems to cause check_call from subprocess to not raise an exception. Nuitka works on a temp file to avoid corruption in these cases, so a) is not quite as important, but I only noticed the emptied file after I was surprised it worked with Python2 suddenly and effectively didn't cover any errors.

c) Of course, these Chinese imports are legal for all of Python3.6 and higher, and it seems some people actually use that. It would be nice to have support for it. For Python3.5 says the same as isort. Not sure how the change came to be, but module names not being ASCII was some pep, and encoding default to utf8, I think is Python3 already, this is not all clear to me really, why source code parsing faces an issue at all with any version of Python3.

I am pretty sure I running isort with 3.10 here. It's also the latest version, 5.12.0 to which I just updated to test if this is already fixed. I am on Windows, which may play a role here, I didn't try Linux. Somehow something seems in need to follow a Python 3.6 change here.

Maybe you feel like, these module names shouldn't exist. But for Nuitka, my Python compiler, I need to have test coverage of them. For now, I am going to have to workaround this somehow by looking at the path.

If you are aware of any workaround, I would be very happy.

And while at it, thanks for this wonderful tool. :-)

shenjackyuanjie commented 11 months ago

some more information you can check some more example over here and some import example here https://github.com/duolabmeng6/pyefun/issues/51