Packages downloaded from anaconda.org have unsupported filenames?

bdice commented 9 months ago

Checklist

[X] I added a descriptive title
[X] I searched open reports and couldn't find a duplicate

What happened?

I have a question that may be a bug in conda-package-handling. It might be intentional behavior.

I often download packages from anaconda.org to check their contents. I visited https://anaconda.org/conda-forge/nanoarrow/files and clicked a link to download the .conda package. This saved a file named linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda. However, running cph x linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda fails. It gives an error like:

LookupError: didn't find info-linux-64_nanoarrow-0.4.0-py310h2372a71_0 component in /mnt/c/Users/bdice/Downloads/linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda

Full traceback:

$ cph x linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda
Traceback (most recent call last):
  File "/home/bdice/miniforge3/bin/cph", line 10, in <module>
    sys.exit(main())
  File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/cli.py", line 121, in main
    api.extract(args.archive_path, args.dest, prefix=args.prefix)
  File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/api.py", line 77, in extract
    format.extract(fn, dest_dir, components=components)
  File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/conda_fmt.py", line 46, in extract
    _extract(str(fn), str(dest_dir), components=components)
  File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/streaming.py", line 35, in _extract
    stream = package_streaming.stream_conda_component(
  File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_streaming/package_streaming.py", line 133, in stream_conda_component
    raise LookupError(f"didn't find {component_name} component in {filename}")
LookupError: didn't find info-linux-64_nanoarrow-0.4.0-py310h2372a71_0 component in /mnt/c/Users/bdice/Downloads/linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda

The problem is that the filename must be changed to nanoarrow-0.4.0-py310h2372a71_0.conda to match the component names, which are named like info-nanoarrow-0.4.0-py310h2372a71_0.tar.zst.

The line linked below is trying to find a component named the same way as the file.

https://github.com/conda/conda-package-handling/blob/b29610fb61647a980daf63baeed4756baced54f4/src/conda_package_handling/conda_fmt.py#L46

Is it reasonable to require that the filename matches the names of the components? I am a bit surprised that renaming the file would make it impossible to extract.

Conda Info

conda version : 23.11.0

Conda Config

channels:
  - conda-forge

Conda list

conda-package-handling    2.2.0              pyh38be061_0    conda-forge
conda-package-streaming   0.9.0              pyhd8ed1ab_0    conda-forge

Additional Context

No response

msarahan commented 9 months ago

I think the relevant code is here: https://github.com/conda/conda-package-streaming/blob/main/conda_package_streaming/package_streaming.py#L127

It certainly could be a more flexible scheme, but just matching prefix (info-) might have some unanticipated edge cases.

bdice commented 9 months ago

just matching prefix (info-) might have some unanticipated edge cases.

Right, that's why I wasn't sure if this was intended behavior. However, it seems like it's quite a stringent requirement for the file to have a particular name in order to be extracted properly. It's certainly not obvious that the file name should have any effect on extracting it (no other compressed format or package format has such a requirement that I am aware of).

jakirkham commented 9 months ago

It has to do with how the files are downloaded. The headers have issues

Clicking the download link and letting the browser handle the download doesn't work. Copying the link from Anaconda and using another download tool (like curl or wget) does work

There's more context in issue: https://github.com/conda/infrastructure/issues/868

mfansler commented 8 months ago

However, it seems like it's quite a stringent requirement for the file to have a particular name in order to be extracted properly. It's certainly not obvious that the file name should have any effect on extracting it (no other compressed format or package format has such a requirement that I am aware of).

I'd second the sentiment here. Coupling the ability to uncompress with the file name is unnecessarily fragile. Aside from getting the website to serve downloads without changing names, I'd really like to see the sensitivity to filename engineered out of the format.

dholth commented 5 months ago

We will fix anaconda.org and will close this ticket when it is deployed.

dholth commented 5 months ago

This should be fixed on anaconda.org. Let us know if it is working for you.

bdice commented 5 months ago

This appears to work now! Thank you @dholth.

conda / conda-package-handling