Implement sharded repodata CEP

conda / conda

A system-level, binary package and environment manager running on all major operating systems and platforms.

https://docs.conda.io/projects/conda/

Other

6.38k stars 1.66k forks source link

Implement sharded repodata CEP #14060

Open dholth opened 2 months ago

dholth commented 2 months ago

Checklist

[X] I added a descriptive title
[X] I searched open requests and couldn't find a duplicate

What is the idea?

https://github.com/conda/ceps/pull/75 describes a format where each individual package has its own repodata. Implement client in conda.

Why is this needed?

Save bandwidth and memory.

What should happen?

No response

Additional Context

https://github.com/conda/conda/pull/13880

https://github.com/conda/conda-index/pull/161

zklaus commented 2 months ago

The lazy index gets its information from SubdirData, so if that handles the sharded data transparently, that would be ideal.

dholth commented 2 months ago

We already convert repodata into a dict mapping package names to records similar to sharded metadata, before parsing them into complete PackageRecord objects. We could behave as if all metadata is sharded. In Python we might be tempted to trap dict access to do the network access, but that is harder to translate into C for libmamba.