Closed asmeurer closed 11 months ago
I'd consider using a different encoding for the hash. Our current base16
could be safely extended to base32
and hence we can afford a shorter hash. The date could be a shorter (yet less readable) timestamp.
If we want readability, I'd keep a series of symlinks to the hashed path. If you want to map back, you can always query the database. Convenience shouldn't trump usability.
[UPD: Nevermind, see my messages below. This is a conda limitation on all platforms, we cannot solve this just by using the extended length prefix.]
@jaimergp Not sure where base32 comes from, it's sha256 + some other stuff:
@property
def build_key(self):
"""A conda environment build is a function of the sha256 of the
environment.yaml along with the time that the package was built.
The last two parts of the build key are to assist finding the
record in the database.
The build key should be a key that allows for the environment
build to be easily identified and found in the database.
"""
datetime_format = "%Y%m%d-%H%M%S-%f"
return f"{self.specification.sha256}-{self.scheduled_on.strftime(datetime_format)}-{self.id}-{self.specification.name}"
In https://github.com/conda-incubator/conda-store/issues/588, Aaron shows this example:
ERROR:root:[WinError 206] The filename or extension is too long: 'C:\\Users\\Aaron\\Quansight\\conda-store\\conda-store-server\\conda-store-state\\default\\99108419ad0fd922fdeff9bbc434b58d41f68e3f923a83f6a7ab19568463bc82-20230915-201619-848782-1-test\\Lib\\site-packages\\sympy\\parsing\\autolev\\test-examples\\pydy-example-repo'
Yes, our build_key
contains redundant information, state directories could also use shorter names. However, even if we somehow had a small build_key
that cannot exceed a certain size, part of the issue is the internal package directory structure, which we don't control. Yes, we can take a look at all existing packages and infer the current max directory path length, but it might still break one day.
IMO, a better solution, which shouldn't require changing much and won't affect other platforms is using the extended-length path prefix "\\?\"
(32,767 chars max), as documented here:
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
IIUC, this will cause Windows to use different APIs, which don't have this legacy path limit. Also, based on what've read, the support for this should go back to Windows 7, but it needs to be verified. The only issue I can think of is not being able to browse these paths using Explorer, but I think that was only a problem on XP, and it was updated to use different APIs on Windows 7, but again, this needs to be verified. Also, it's not something that should regularly happen as these are internal. Finally, these extended paths are supposed to be absolute (cannot use .
or ..
), but this doesn't seem like a problem for us.
A number of projects seem to be using this approach:
win32._makeLong
: https://github.com/nodejs/node/blob/229a03f1bb779f27253d4934aa3a5c5d7072d345/lib/path.js#L295core.longpaths
: https://github.com/git-for-windows/git/blob/d003d728ffa6c0006da875ec6318d3f6b28a4ddb/Documentation/config/core.txt#L560-L565get_extended_length_path_str
: https://github.com/pytest-dev/pytest/blob/7.4.2/src/_pytest/pathlib.py#L133So I'm going to explore the extended-length path prefix
[UPD: Nope, we'll need to change the "\\?\"
first.build_key
- see below.]
Not going to explore a different build_key
for now, but it raises a number of questions if we decide to change it:
build_key
.Requirements for new build_key
if we decide to change it:
build_key
docstring mentions some of those).I forgot to mention advantages of the extended path prefix "\\?\"
:
Going to write a PR and report if anything breaks.
[UPD: Going to shorten the build_key
instead.]
OK, nevermind. Jaime told me this is needed because of point 1 here: https://github.com/conda-incubator/conda-store/issues/588#issuecomment-1788703931. So there's no other way than to shorten the build_key
.
conda-build will use a default prefix placeholder of 255 characters. This means that the prefix base path (e.g. ~/.local/conda-store/envs/env-name-1234abcd) always needs to be under 255 characters. There should be a check that validates the configuration and ensures that base installation path + intermediate directories + default length of the hashed environment <= 255. This is for all platforms.
Repro: https://github.com/conda-incubator/conda-store/issues/588#issuecomment-1751556643
@jaimergp I've thought more about this in the context of the conda prefix. Let's look at Aaron's example, which exceeded the limit:
/private/var/folders/wc/dppcpmxs1tlb36nqcw853wkm0000gn/T/pytest-of-aaronmeurer/pytest-12/test_conda_store_register_envi0/conda-store-state/pytest-namespace/35f604188f69ceb5d9e3fae2c93ffd48c2971d192f21c776e3ebfed7e1196868-20231005-205956-556312-1-pytest-name
conda-store-state
now would be .conda-store/state
(later we're planning to make this controlled via XDG, see #648)conda-store-state
can potentially be c:\\
or /home/<user>
, say, 10-15 charsSummary: there are many things that are non-constant and can exceed the prefix length.
I think the long-term solution should be similar to what Nix uses: https://nixos.org/guides/nix-pills/nix-store-paths. The ideal scheme should be: ROOT + constant value (hash). We can instruct users to have a shorter ROOT and we'll know exactly how long that ROOT can be to not exceed the prefix (determined by the hash length). The main question here is how to select this hash. Can it be just the hash of the conda lockfile? IIUC, lockfiles are supposed to be deterministic. This will also allow for sharing when multiple users end up building the same env. However, we need to be careful about access to this shared resource (such as on deletion). I'll think more about this, but I welcome comments on this idea.
For now, I'll implement the truncated hash + timestamp idea, but it's a bandaid rather than a real solution. We should use something else long-term.
UPD: More info on Nix hashes: https://nixos.wiki/wiki/Nix_Hash
Conda itself would need to be updated to use \\?\
. And anyway it could always happen that some other package gets installed that needs to access paths in the install prefix and doesn't use this API. The more robust fix is to use something shorter for the hash+timestamp.
Status update: submitted a WIP PR but need to address review feedback: https://github.com/conda-incubator/conda-store/pull/652.
Opened a separate issue for Nix-like hashes: https://github.com/conda-incubator/conda-store/issues/678
Context
Environment paths currently use a hash of the environment plus a timestamp plus the environment name. For example
99108419ad0fd922fdeff9bbc434b58d41f68e3f923a83f6a7ab19568463bc82-20230915-201619-848782-1-test
for an environment named "test".This path is very long (90 characters plus the environment name) and causes issues on Windows where the default filesystem limits paths to 260 characters, and some packages can have long paths. See https://github.com/conda-incubator/conda-store/issues/588. For example,
notebook
has a path with 122 characters (https://github.com/conda-incubator/conda-store/issues/588#issuecomment-1747541889), so 122 + 90 is 212 characters, leaving only 48 characters for the root of wherever the conda-store state directory is stored plus the environment name.Long paths can also cause issues on other platforms for some packages that are built to only be able to be installed in paths up to 255 characters. See https://github.com/conda-incubator/conda-store/issues/588#issuecomment-1751556643
Value and/or benefit
On Windows, this can be fixed by setting a registry key, but this requires administrator privileges, so some deployments might not be able to do it.
Anything else?
See https://github.com/conda-incubator/conda-store/issues/588 for more technical details on the Windows path length problem.
My suggestion would be to use a shorter hash. We can incorporate the timestamp into the hash, since it is already stored in the database. However, we would need to make sure this is done in a way that is backwards compatible.