conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.22k stars 979 forks source link

Windows: non-latin symbols in USERPROFILE or PATH break conan #2163

Closed anatoly-spb closed 1 year ago

anatoly-spb commented 6 years ago

The following script in 866 code page does nothing, but must install cmake files to generate the project:

call git clone https://github.com/memsharded/example-poco-timer.git 
cd example-poco-timer
set CONAN_USER_HOME=
set USERNAME=Профиль
set USERPROFILE=F:\Профиль
mkdir build 
cd build
call conan install ..

Double check the output, as you see conan says and does nothing:

F:\Work\conan>call git clone https://github.com/memsharded/example-poco-timer.git
Cloning into 'example-poco-timer'...
remote: Counting objects: 40, done.
remote: Total 40 (delta 0), reused 0 (delta 0), pack-reused 40
Unpacking objects: 100% (40/40), done.

F:\Work\conan>cd example-poco-timer

F:\Work\conan\example-poco-timer>set CONAN_USER_HOME=

F:\Work\conan\example-poco-timer>set USERNAME=Профиль

F:\Work\conan\example-poco-timer>set USERPROFILE=F:\Профиль

F:\Work\conan\example-poco-timer>mkdir build

F:\Work\conan\example-poco-timer>cd build

F:\Work\conan\example-poco-timer\build>call conan install ..

F:\Work\conan\example-poco-timer\build>dir
 Том в устройстве F имеет метку WORK
 Серийный номер тома: 4E18-2D78

 Содержимое папки F:\Work\conan\example-poco-timer\build

16.12.2017  18:02    <DIR>          .
16.12.2017  18:02    <DIR>          ..
               0 файлов              0 байт
               2 папок  29 097 922 560 байт свободно

If I assign to the USERPROFILE environment variable a latin string or set CONAN_USER_HOME to latin path everything is fine.

Also I observe a strange error when PATH contains non-latine symbols:

git clone https://github.com/anatoly-spb/conan-libqrencode
cd conan-libqrencode
set PATH=F:\Профиль;%PATH%
conan create bincrafters/stable

failed with

libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' built
libqrencode/4.0.0@bincrafters/stable: Build folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\build\9b1a62505a16cd151a66fc2504680042cb2
cf4f2
libqrencode/4.0.0@bincrafters/stable: Generated conaninfo.txt
libqrencode/4.0.0@bincrafters/stable: Generated conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable: Generating the package
libqrencode/4.0.0@bincrafters/stable: Package folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\package\9b1a62505a16cd151a66fc250468004
2cb2cf4f2
libqrencode/4.0.0@bincrafters/stable: Calling package()
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.h' files: qrencode.h
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.lib' files: qrencode.lib
libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' created
libqrencode/4.0.0@bincrafters/stable test package: Generator cmake created conanbuildinfo.cmake
libqrencode/4.0.0@bincrafters/stable test package: Generator txt created conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable test package: Generated conaninfo.txt
ERROR: Unable to build it successfully
  File "conan\conans\client\tools\env.py", line 34, in environment_append
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd2 in position 1792: ordinal not in range(128)
memsharded commented 6 years ago

Yes, that is totally possible, we have tried to workaround that in the past, but seems really challenging. Specially for OS related things, like paths. Some of the limitations comes because of keeping compatibility with python 2.7, because python 3 support for Unicode and encodings in general is much better. Are you running conan with python 2 or 3? Because I would recommend you the later. If you haven't yet, please try and tell me. Thanks!

memsharded commented 6 years ago

I add the label "fix wanted", because it is also a bit difficult to test and reproduce, it requires installing non-latin OS languages, etc., so it is difficult to test also in CI. Any help to address these issues is very welcome.

anatoly-spb commented 6 years ago

@memsharded I have tried python 3.6 wihtout success:

set PATH=c:\Python36;%PATH%
call python.exe -V
call git clone https://github.com/memsharded/example-poco-timer.git 
cd example-poco-timer
set CONAN_USER_HOME=
set USERNAME=Профиль
set USERPROFILE=F:\Профиль
mkdir build 
cd build
call conan install ..
dir 

outputs

F:\Work\conan\example-poco-timer>call python.exe -V
Python 3.6.3

F:\Work\conan\example-poco-timer>set CONAN_USER_HOME=
F:\Work\conan\example-poco-timer>set USERNAME=Профиль
F:\Work\conan\example-poco-timer>set USERPROFILE=F:\Профиль
F:\Work\conan\example-poco-timer>mkdir build
F:\Work\conan\example-poco-timer>cd build
F:\Work\conan\example-poco-timer\build>call conan install ..
F:\Work\conan\example-poco-timer\build>dir
 Том в устройстве F имеет метку WORK
 Серийный номер тома: 4E18-2D78
 Содержимое папки F:\Work\conan\example-poco-timer\build
17.12.2017  00:24    <DIR>          .
17.12.2017  00:24    <DIR>          ..
               0 файлов              0 байт
               2 папок  29 097 041 920 байт свободно

the same with PATH:

libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' built
libqrencode/4.0.0@bincrafters/stable: Build folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\build\9b1a62505a16cd151a66fc2504680042cb2cf4f2
libqrencode/4.0.0@bincrafters/stable: Generated conaninfo.txt
libqrencode/4.0.0@bincrafters/stable: Generated conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable: Generating the package
libqrencode/4.0.0@bincrafters/stable: Package folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\package\9b1a62505a16cd151a66fc2504680042cb2cf4f2
libqrencode/4.0.0@bincrafters/stable: Calling package()
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.h' files: qrencode.h
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.lib' files: qrencode.lib
libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' created
libqrencode/4.0.0@bincrafters/stable test package: Generator cmake created conanbuildinfo.cmake
libqrencode/4.0.0@bincrafters/stable test package: Generator txt created conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable test package: Generated conaninfo.txt
ERROR: Unable to build it successfully
  File "conan\conans\client\tools\env.py", line 34, in environment_append
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa6 in position 4: ordinal not in range(128)

Anyway there is a workaround with setting CONAN_USER_HOME to a latin path and remove from PATH non-latin symbols, so as for me this is not big problem.

mpdelbuono commented 6 years ago

@memsharded I have a bunch of paths using Japanese characters so I can probably take a look at this and see if I can write up a fix, but I have a question based on the earlier comments. The goal is to make this work in both Python 2 and 3, right? There isn't something you're aware of that would prevent it from working in one or the other? Or are you only looking to support it in Python 3?

memsharded commented 6 years ago

Yes, you are right. I would like to go Py3 only, but the user base still using 2.7 is very large, so I think we still need to support 2.7. Which is probably the main inconvenience for robust handling of this issues, most of the times I tried to address these things there were stoppers in py2.7 (I can't recall exactly what :(, but one of the things I recall is failing under OS messages, that can't be easily replicated in testing, and are also depending on the CP of the output terminal)

mpdelbuono commented 6 years ago

So as a status update: I've started to break some ground on Python 2.7's handling of unicode paths. I'm approaching design-wise as "all paths are always unicode" and if that's true, then everything else should fall into place (Python 2.6+'s IO classes support unicode paths, as long as you're using the unicode type, and Python 3's string types are always unicode).

The problem behind this approach, however, is that Python 2 and 3 differ wildly in how they handle strings. Python 3 strings are always unicode, but Python 2 requires some special calls. The problem here is that those calls either don't exist in Python 3 or are deprecated, resulting in the need to special case. On top of this, there are some APIs that simply don't accept unicode strings in Python 2, so a different approach must be taken.

For now I'm trying to abstract this away by putting the switching logic in the tools module, making the rest of the code common, but it's a little bit unfortunate that this is the way it goes. For example, the following code is in that compatibility module:

import imp
from io import open

# Alternate implementation of imp.load_source which
# uses open() and imp.new_module() instead. This is to work around the fact that
# Python 2's imp.load_source() does not support Unicode. See http://bugs.python.org/issue9425
def load_source(name, pathname):
    try:
        return imp.load_source(name, pathname)
    except UnicodeEncodeError:
        # fallback to open()/new_module
        with open(pathname, encoding='utf8') as f:
            module = imp.new_module(name)
            module.__file__ = pathname
            exec(f.read(), module.__dict__)
            return module

Anyway... so far I've gotten conan download to work properly when the data directory is configured in conan.conf with a unicode path name. This is only the first step, but I believe if I follow the same paradigm everywhere, it should continue to work. My question is: do you think this approach is sane?

memsharded commented 6 years ago

I'm approaching design-wise as "all paths are always unicode" and if that's true

One of the limitations that I have found in the past with Py2.7 is that sometimes it is not under user control. You can manage a lot, but at some point you have to rely on some python library feature, that it will always fail, it won't accept unicode, or attempt to decode it to something that fails, etc.

I don't want to be pessimistic, but wanted to share my previous pains trying this. I was able to do some progress, but always got stuck somewhere :(

What you are proposing is just a workaround for one of those issues. It reads a bit like cluttering the codebase, so I am not sure about extending it everywhere.

In any case, many thanks, you are doing a fantastic job.

I am thinking, maybe the sanest thing would be to aim for better unicode support just in Python 3. Users wanting it should work in Python 3 (which is also a good reason to move some userbase to Py3, hopefully we are able to deprecate Py2 sooner), and while it will require some work, will be much easier than achieving it in Py2&3. What do you think?

mpdelbuono commented 6 years ago

I've been poking various places and I've come to the same conclusion that you have, that it's probably better to just say "if you want Unicode support, use Py3". The sheer number of places where I would have to add an abstraction layer is probably more effort than it is worth.

That being said, Py3 doesn't work on its own, yet, but I'm working on fixing that. At least this way all strings can support unicode natively and thus most of the OS elements handle it automatically. There's just a few "gotcha" places, for which I should have a patch soon.

(I'm debating whether or not c112a21 should stay in, as it was made to support Py2. It seems unfortunate to pull out already-done work, but it's also potentially unclean.)

memsharded commented 6 years ago

Ok, I am happy to know that it is not just I am too incompentent :) :)

Regarding that code, I am not sure either. Does it improve some case for py2? If not, and the codebase would be cleaner, then yes, it might make sense to remove it (if it won't break any users).

Thanks, great job!

mpdelbuono commented 6 years ago

@anatoly-spb Could you please try putting chcp 65001 at the top of your batch script first? It appears that the SET commands weren't running correctly, resulting in a very broken environment variable to begin with.

Once that's done, can you please try running the conan on my fork's branch feature/unicode-path-support and see if you have trouble with it? There were some problems with configuration files that needed to be addressed, but I think they have been all resolved in that fork and I'm no longer able to reproduce your problem (assuming the chcp 65001 is added at the top of the batch script).

Note: this only works with Python 3 (tested using 3.6.4)

ForNeVeR commented 6 years ago

@mpdelbuono just a side note: @anatoly-spb mentioned that the script was saved in CP866 (it's the OEM code page for Windows in Russian locale), so I think that SET should've worked correctly in that case.

mpdelbuono commented 6 years ago

Thanks @ForNeVeR , I totally missed that line and didn't realize that was the OEM code page. Unfortunately while I have a lot of languages installed, I still run with the en-US as my primary locale so I'm not entirely sure of what all the default settings are.

Switching into CP 866, it seems like everything is still working minus one small bug that I've found (regardless of which CP is used), so I think the fork above should (mostly) work. I'll have that one other item patched up soon and then submit a pull request. At a minimum, it's better than it was.

ForNeVeR commented 6 years ago

@mpdelbuono I think that setting chcp 866 and loading the batch file written in CP866 before running Conan should be enough to run a representative experiment. So, if it worked in your environment, then it should work for everybody. Congratulations.

jgsogo commented 6 years ago

The path to deprecate python2 has been defined (https://blog.conan.io/2018/08/13/Its-Time-To-Deprecate-Python-2.html) and many efforts towards handling unicode in a compatible way for py2 and py3 have been done without a clear success (thanks to all!! 🙌), so I agree with the "if you want unicode support, use Python 3", don't you?

Said that, this issue should be retargeted to an up-to-date version of Conan running in python 3, is it still an issue, @anatoly-spb?

blockspacer commented 3 years ago

Hi.

Please test non-latin symbols with extra logging to catch more errors:

export VERBOSE=1
export CONAN_REVISIONS_ENABLED=1
export CONAN_VERBOSE_TRACEBACK=1
export CONAN_PRINT_RUN_COMMANDS=1
export CONAN_LOG_RUN_TO_FILE=1
export CONAN_LOGGING_LEVEL=10

Also: how to debug 'UnicodeDecodeError: 'utf-8' codec can't decode byte error?

MarkusRannare commented 1 year ago

I have been scratching my head for a few days trying to follow: https://docs.conan.io/en/latest/developing_packages/package_dev_flow.html when conan create . user/channel fails with The system cannot find the path specified. But running:

conan source . --source-folder=tmp/source
conan install . --install-folder=tmp/build
conan build . --source-folder=tmp/source --build-folder=tmp/build
conan package . --source-folder=tmp/source --build-folder=tmp/build --package-folder=tmp/package
conan export-pkg . user/channel --source-folder=tmp/source --build-folder=tmp/build
conan test test_package hello/1.1@user/channel

Worked.

It seems like in the end it was that I had a ä (non ASCII character) in my username created by windows when setting up my computer. So it couldn't find the path C:\Users\MarkusRännare.conan\data\premake\v5.0.0-beta2\mackyman\testing\build\5a61a86bb3e07ce4262c80e1510f9c05e9b6d48b\conanbuild.bat when trying use self.run(...) in def build:

My current workaround is setting the Environment variable CONAN_USER_HOME to a pure ASCII path. I was able to reproduce the issue by setting CONAN_USER_HOME to: C:\ProgramData\Cönan so it should be possible to test it during CI

memsharded commented 1 year ago

Hi @MarkusRannare

I am trying to reproduce this, and I have added a test in https://github.com/conan-io/conan/pull/12863. I cannot make it fail, it seems to be working fine. I would need more feedback to narrow it down.

Also, quick feedback. The page that you are reading is a bit outdated, and most of the cli args regarding folders are being removed in 2.0, and instead they are replaced by layout() method definition. See: https://docs.conan.io/en/latest/developing_packages/package_layout.html for more information.

MarkusRannare commented 1 year ago

@memsharded, Thanks for the quick reply. I'll see if I can make the tests fail. Also thanks for the hints about layout(). I have been trying to wrap my head around what it does, but apparently my documentation-fu is to low level, so I'll read that page again with the new mindset =)

MarkusRannare commented 1 year ago

Hi @memsharded. I created the pull request #47 to your branch that fails the test. Sorry if this is not the way to go about it, but this is on of my first contributions to any projects, so I'm sorry for any inconveniences that it caused you.

I noticed that I missed a point that was needed to fail the test. For me it seems to be using a generator MSBuildToolchain. If you have a cleaner way of writing the test, I apologize, I tried to mash something together that actually failed the test with beeing more of a consumer than a python dev

memsharded commented 1 year ago

I have been able to investigate this. The error happens because:

So it seems that .bat files do not like that path. I'll check if it is possible to workaround it.

memsharded commented 1 year ago

@MarkusRannare it seems this is not solved yet, I have created PR https://github.com/conan-io/conan/pull/12890 that fails. The problem is that .bat files to define environment are not UTF-8, and all my attempts so far to handle special chars in the generated .bat files have failed.

At the moment I see no other alternative than defining a Conan home to another path not containing them. Lets keep investigating over https://github.com/conan-io/conan/pull/12890 to see what can be done

ForNeVeR commented 1 year ago

There are certain well-known tricks to handle this situation in most (yet not all!) cases: if you absolutely have to use UTF-8 encoded paths in Batch scripts, try using short paths. They are usually enabled on Windows (though it's possible to turn them off) and are ANSI compatible.

You could also see if you can fall back to PowerShell that parses UTF-8 encoded script files properly.

memsharded commented 1 year ago

The thing is that short_paths have been removed in 2.0, so that will not be an alternative (relocating the Conan home still more recommended). With Powershell I didn't try that much, but I haven't been able to make it work either :(

MarkusRannare commented 1 year ago

My two cents in this, is if it's not feasible to solve. A compile time error that states the issue that the path contains UTF-8 encoded characters, and that you have to relocate your Conan home folder will save lots of hair tearing for users that don't think about that their username contains UTF-8 characters (my was created by Windows by azure-AD domain, so didn't think about it)

memsharded commented 1 year ago

My two cents in this, is if it's not feasible to solve. A compile time error that states the issue that the path contains UTF-8 encoded characters, and that you have to relocate your Conan home folder will save lots of hair tearing for users that don't think about that their username contains UTF-8 characters (my was created by Windows by azure-AD domain, so didn't think about it)

I wish it was that simple, but that is also a bit risky. First, it might not fail depending on the build system integration (for example, cmake with Visual Studio generator, will not fail, because it doesn't need the environment). It also depends on the system, if the user has chcp 65001 configured in their terminal, it might even work fine, so raising an error for these users will be a blocker and very annoying. I know it is not very likely, because that is not the default, but sooner or later with the Conan scale and so many thousands of users, it will also block some users.

We will keep exploring possibilities for the next release, lets see what we can improve here, thanks very much for the suggestion!

memsharded commented 1 year ago

I have managed to solve this, now hopefully fully solved, in https://github.com/conan-io/conan/pull/12890, it will be released in next 1.58 (you can test it now running Conan from source develop branch), closing the ticket, thanks again for the feedback.