dennisvang / tufup

Automated updates for stand-alone Python applications.
MIT License
100 stars 2 forks source link

Update fails when source or destination paths contain non-ASCII characters #128

Open Bezduszny opened 8 months ago

Bezduszny commented 8 months ago

Describe the bug Paths with non-ASCII characters in them are written and then read incorrectly during update process, which causes the update process to fail.

To Reproduce

  1. Create a Windows user with a username containing non-ASCII character (e.g., Łukasz)
  2. Login as newly created user
  3. Clone tufup
  4. Install dependencies (from requirements.txt)
  5. Equivalent error with wrong path should occur during dependencies installation

Same thing will happen while updating an app on that user account.

System info (please complete the following information):

Additional context Problematic characters are written incorrectly when the batch file is created. It's created with NamedTemporaryFile which inside uses basic open. According to THIS answer open uses locale.getpreferredencoding() by default. For some reason on my machine that function returns cp1252 (just like the author of the answer says) even though [System.Text.Encoding]::Default in PowerShell gives utf-8. Unfortunately cp1252 encoding does not support Ł. Apparently python 3.15 will use utf-8 as default, but until then I guess we need to work around it.

Possible solution I added encoding="utf-8" to batch creation like so:

with NamedTemporaryFile(
    mode='w', prefix=WIN_BATCH_PREFIX, suffix=WIN_BATCH_SUFFIX, delete=False, encoding="utf-8"
) as temp_file:
    temp_file.write(script_content)

which did not solve the problem yet, because even though the batch file was fine, cmd does not operate in utf-8 by default. So I also added chcp 65001 (changes code page, 65001 is utf-8) before running the batch file, like so:

subprocess.Popen(["chcp 65001 & "+ script_path], creationflags=process_creation_flags)

That fixed the problem with updating. Considering the same problem occurs while installing dependencies, there's probably more places where similar or a different fix should be applied.

dennisvang commented 8 months ago

@Bezduszny Thanks for the clear and thorough report.

I'll look into this as soon as I have some spare time.

dennisvang commented 8 months ago

@Bezduszny If you have a stack trace handy, that would be helpful.

Bezduszny commented 8 months ago

Installing dependencies:

(...)
Building wheels for collected packages: tufup
  Building editable for tufup (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building editable for tufup (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [1240 lines of output]
      running editable_wheel
      --- Logging error ---
      Traceback (most recent call last):
        File "C:\Program Files\Python310\lib\logging\__init__.py", line 1103, in emit
          stream.write(msg + self.terminator)
        File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 19, in encode
          return codecs.charmap_encode(input,self.errors,encoding_table)[0]
      UnicodeEncodeError: 'charmap' codec can't encode character '\u0141' in position 18: character maps to <undefined>
      Call stack:
        File "C:\Users\\u0141ukasz\tufup\venv\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 351, in <module>
          main()
        File "C:\Users\\u0141ukasz\tufup\venv\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 333, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "C:\Users\\u0141ukasz\tufup\venv\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 271, in build_editable
          return hook(wheel_directory, config_settings, metadata_directory)
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\build_meta.py", line 443, in build_editable
          return self._build_with_temp_dir(
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\build_meta.py", line 395, in _build_with_temp_dir
          self.run_setup()
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 1, in <module>
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
          return run_commands(dist)
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
          dist.run_commands()
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\dist.py", line 967, in run_command
          super().run_command(command)
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
          cmd_obj.run()
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\command\editable_wheel.py", line 148, in run
          self._ensure_dist_info()
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\command\editable_wheel.py", line 167, in _ensure_dist_info
          dist_info.run()
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\command\dist_info.py", line 92, in run
          self.egg_info.run()
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\command\egg_info.py", line 306, in run
          self.mkpath(self.egg_info)
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\_distutils\cmd.py", line 342, in mkpath
          dir_util.mkpath(name, mode, dry_run=self.dry_run)
        File "C:\Users\\u0141ukasz\AppData\Local\Temp\pip-build-env-152a_6k8\overlay\Lib\site-packages\setuptools\_distutils\dir_util.py", line 71, in mkpath
          log.info("creating %s", head)
      Message: 'creating %s'
      Arguments: ('C:\\Users\\\u0141ukasz\\AppData\\Local\\Temp\\pip-wheel-4gtj0h3h\\.tmp-ddxgmfxj\\tufup.egg-info',)

I just noticed that no tufup files are involved here, it's just logging error. Apparently pip on Windows has that problem for years. Maybe there's no need to change anything other than the code I mentioned above.

Updating:

  (...)
  File "C:\Users\Work\test\venv\lib\site-packages\tufup\utils\platform_specific.py", line 57, in install_update
    return _install_update(
  File "C:\Users\Work\test\venv\lib\site-packages\tufup\utils\platform_specific.py", line 208, in _install_update_win
    temp_file.write(script_content)
  File "C:\Program Files\Python310\lib\tempfile.py", line 483, in func_wrapper
    return func(*args, **kwargs)
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0141' in position 144: character maps to <undefined>
dennisvang commented 8 months ago

It looks like non-ascii paths are still a bit of a minefield, especially on windows. For example, I'm also seeing trouble extracting the tar archive if its path contains non-ascii characters (windows uses bsdtar).

EDIT:

also see e.g. https://superuser.com/q/60379 (although old...)

Bezduszny commented 8 months ago

What kind of trouble? I just tested unpacking small .tar.gz archives with shutil.unpack_archive and it works for me. If I understand this correctly, it depends on where the archives were created. I made and extracted them on Windows, maybe that's why it works?

PS Would there be any downside to using zip instead? I believe it would work out of the box. I actually have been using .zip with tufup for a while now instead of .tar.gz because patches were much smaller. I saw your recent changes and it looked like a bit of work compared to my quick hack with the format... I should have mentioned something before, sorry 😅On the other hand adapting to format change would probably be a bit complex for users of tufup.

dennisvang commented 8 months ago

@Bezduszny The trouble was using tar (bsdtar) in powershell, so not a python module. Haven't looked into it in detail yet.

The choice for .tar.gz over .zip was due to the separation of archive and compression method, which was important for us at the time, as well as some issues of zip availablility on some barebones linux installs. I think zip also has similar issues to gzip when taking a binary diffs of compressed data, although the fact that files are compressed individually in the zip probably mitigates the issue.

I'll get back to this later.

Bezduszny commented 8 months ago

I did a bit more testing and adding chcp 65001 to subprocess.Popen does not work, I guess it's because it's not an actual file/program. I only tested it in cmd before and assumed it would be fine. It can be added to the beginning of the batch file though, so that's what I did. Now it works as intended.