Open jakirkham opened 1 year ago
It'd be nice to also get this page updated: https://docs.conda.io/projects/conda-build/en/latest/resources/package-spec.html
Would suggest raising a new conda-build doc issue
So .conda packages are ZIP-format containers with a metadata.json file containing just the version number, and then an info and pkg file that are always .tar.zst even though some earlier documentation hoped to support "any libarchive filter". The order of metadata, info and pkg inside the ZIP does not matter.
Put together the pkg- and info- tarballs have exactly the same contents as old-format .tar.bz2
conda packages. Generally the info/ subdirectory of a .tar.bz2 package goes into the info- tarball of a .conda.
conda-package-handling uses a list of regular expressions to determine which files go into info/, but this list excludes some files that obviously belong in info/ - for example info/LICENSE vs info/LICENSE.txt. We should audit the existing packages to see whether we can drop this behavior and simply include info/ wholesale. Do packages include significant application data in info/ (besides test data, which is already intentionally in info/)?
A regular conda install unpacks both inner .tar.zst and does not use the "easy to inspect just the metadata" feature provided by the info/pkg split. This is still good, because zst is much, much faster to extract compared to bz2.
We might want to standardize whether info- or pkg- gets extracted first, or enforce that one cannot overwrite the other (that no filename appears in both inner tarballs).
Separate from the .conda container is the shared question of what the metadata looks like. This probably has to be a different, longer document.
Forget where this was discussed atm, but recall one point of confusion was whether conda_pkg_format_version
should be an int
or a str
. Would be nice to resolve this as part of this work
We might want to standardize whether info- or pkg- gets extracted first, or enforce that one cannot overwrite the other (that no filename appears in both inner tarballs).
Yea, clobbered files in info/
(i.e. package overwrites conda metadata) should be prevented with an error by conda-build (and alike) before the artifact is generated.
I don't think the normal way of creating .conda
can create clobbered files. It takes a list of filenames and categorizes them into two groups. The check would need to be on extraction.
No, but conda-build can infer which files have gotten into info/
and flag those that would result in a clobber error, I think?
It would be good to have a CEP that spells out what is in the
.conda
format as this is missing atm. Especially as we increasingly rely on this and depend on a few tools to manage reading and writing these. Currently the info we have, which could be used for this CEP is...Would be good to pull this together to provide a single point of truth.
Independently there are some things that we might want to consider to amend the specification like generating/reusing a Zstandard dictionary for faster and more compact compression/decompression and have per file format dictionaries (text files may benefit a lot from this for example).