astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
19.29k stars 570 forks source link

Filename truncated during https tar stream #5450

Open WH-2099 opened 1 month ago

WH-2099 commented 1 month ago

When a specific https index-url is used, a specific filename is truncated.

Here's a summary of a very stable reproduction I've made.

uv 0.2.29 (39be71f40 2024-07-24)

uv cache clean
uv pip -v install --reinstall aliyun-python-sdk-core==2.15.1 -i https://pypi.tuna.tsinghua.edu.cn/simple/ 2>&1 | tee log

# The `uv cache clean` `tee log` `-v` `--reinstall` 
# are just to make it easier to reproduce the scene
# they are not directly related to the problem.
# Here's the problem file
cache_dir=$(dirname $(grep -oP 'build_wheel\("\K[^"]+' log))
echo $cache_dir
ls -al $cache_dir/aliyun-python-sdk-core-2.15.1.tar.gz/aliyunsdkcore/vendored/requests/packages/urllib3/contrib/_appengine_en

This filename here should have been _appengine_environ.py, but is now truncated to _appengine_en.

Specific file content can be reviewed by manually downloading the corresponding source file. https://pypi.tuna.tsinghua.edu.cn/packages/3a/e6/f579e8a5e26ef1066f6fb11074cedc9f668cb5f722c85cf7adc0f7e2e23e/aliyun-python-sdk-core-2.15.1.tar.gz

If you switch to using the http version of this mirror source http://pypi.tuna.tsinghua.edu.cn/simple/, this problem does not occur.

If you switch to any other index-url, such as https://pypi.org/simple/ you won't have this problem.

WH-2099 commented 1 month ago

I think this is a bug of concern because it causes silent filename changes and ultimately missing files for installed packages.

Considering that longer index-url is not uncommon in private environments, this could have an impact on production level environments and be very difficult to track down. (I'm actually an example of this myself 🤣)

I'll keep following up on this, so feel free to contact me if there's anything I can do to help!

charliermarsh commented 1 month ago

Is your filesystem silently truncating filenames that exceed a certain length?

WH-2099 commented 1 month ago

Is your filesystem silently truncating filenames that exceed a certain length?

Pretty sure it's not.

charliermarsh commented 1 month ago

I'll take a look.

charliermarsh commented 1 month ago

This appears to be a problem during the unpacking of the tar file. We see that truncated filename as soon as we ask the tar crate for the entries (aliyun-python-sdk-core-2.15.1/aliyunsdkcore/vendored/requests/packages/urllib3/contrib/_appengine_en).

I think there must be something going wrong in the index itself, honestly. If I download the file to disk, then install it, it works correctly. Similarly, if I install from PyPI, I get the right result (and I confirmed that the zip files are identical between the two indexes). But if I stream and uncompress from the aliyun index, I get the wrong result. It's really hard for me to find any root cause for that. My guess is that there's an incorrect header somewhere that's causing the streamed decompression to fail?

WH-2099 commented 1 month ago

Thanks for following up, I will modify the problem description accordingly.

WH-2099 commented 1 month ago

I also found that switching to the http version that uses this mirror source does not cause this problem.

This also meant having to decrypt tls traffic, I tried the traditional environment variable SSLKEYLOGFILE to no avail.

I'm still new to rust, can you point me in the right direction as to how to get the tls key here in order to decrypt the session traffic for locating the problem?

failable commented 1 week ago

@WH-2099 Have you figured out a way to fix this issue? I am facing this issue when using the Aliyun OSS.