conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
137 stars 44 forks source link

Off-label use of `conda_package_streaming` #767

Open gzt5142 opened 4 months ago

gzt5142 commented 4 months ago

Context

Context:


The error from a conda-store build attempt includes a stack trace ending:

File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 416 Client Error: Range Not Satisfiable for url: https://redacted@nexus.internal.host/repository/conda-forge/linux-64/python_abi-3.11-4_cp311.conda

As I chase that down, I see here that conda-store is using conda_package_streaming to simply download the package archive. I suspect (but can't fully test) that streaming through the nexus proxy is causing my problem.

My problem aside... using the streaming module in this way is discouraged per the conda library documentation.

If a full .conda format package is needed, it is more efficient to download locally first and then use the file-based API.

The way I read that code in download_packages.py, it is only doing the full download rather than a partial -- so the streaming module is not the advised solution. They want us to download separately then extract.

One way to achieve that would be (in place of lines 35-41 in download_packages.py)

_tmpfilename, _ = urllib.request.urlretrieve(url, filename=str(file_path))
conda_package_handling.api.extract(_tmpfilename)

Value and/or benefit

Using the library as described in the API docs will be more 'efficient' (purportedly).

Downloading instead of streaming will likely also help in situations when streaming may not be possible, such as through this proxy. (Again... this is my theory; I don't have a good way to test this right now).

Anything else?

I don't know if I'd call this a bug, or even a problem -- but it would seem to be at odds with the recommendations from the conda_package_streaming documentation.

nkaretnikov commented 4 months ago

Thanks for the report! I will look into it. Related: #739, #734.

nkaretnikov commented 4 months ago

@gzt5142

Downloading instead of streaming will likely also help in situations when streaming may not be possible, such as through this proxy. (Again... this is my theory; I don't have a good way to test this right now).

Could you change the source locally and try making the change you suggested above? If this works with your proxy, then we can evaluate whether it's something that can be changed in upstream conda-store code.

gzt5142 commented 3 months ago

Because we couldn't get the proxy to work with conda-store, we went another direction. So I'm not in a position to do this test.