Open meditans opened 1 month ago
Just in case this is the same error propagating from triton, could you please test https://github.com/NixOS/nixpkgs/pull/320888? I probably won't get around to do this myself for another week
Hi @SomeoneSerge, I tested the PR you mentioned and it's surely a step in the right direction! I don't get the ldconfig -p
related errors anymore.
I get this now, which is technically a different problem but still related to torch.compile
(I didn't have much luck googling for it):
File "/nix/store/r42c89mj865p7vjb62jnknxi9bsqlr47-python3-3.11.9-env/lib/python3.11/site-packages/torch/utils/_triton.py", line 37, in triton_backend_hash
from triton.common.backend import get_backend, get_cuda_version_key
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
ImportError: cannot import name 'get_cuda_version_key' from 'triton.common.backend' (/nix/store/r42c89mj865p7vjb62jnknxi9bsqlr47-python3-3.11.9-env/lib/python3.11/site-packages/triton/common/backend.py)
Do you know where this could stem from?
For the curious, here's the flake I used to test the PR:
{
description = "Check SomeoneSerge's ldconfig -p fix";
inputs.nixpkgs.url = "github:SomeoneSerge/nixpkgs/530d5769d57e88bde975c5fefb9ecfcddba89742";
outputs =
{ self, nixpkgs }:
let
pkgs = import nixpkgs {
system = "x86_64-linux";
config.allowUnfree = true;
};
in
{
devShell.x86_64-linux =
with pkgs;
(mkShell.override { stdenv = gcc11Stdenv; }) {
venvDir = "./.venv";
buildInputs = [
(pkgs.python3.withPackages (
ps: with ps; [
torch-bin
openai-triton-bin
]
))
pkgs.virtualenv
pkgs.python3Packages.venvShellHook
];
postVenvCreation = ''
unset SOURCE_DATE_EPOCH
'';
};
};
}
ImportError: cannot import name 'get_cuda_version_key' from 'triton.common.backend' (/nix/store/r42c89mj865p7vjb62jnknxi9bsqlr47-python3-3.11.9-env/lib/python3.11/site-packages/triton/common/backend.py)
Maybe our torch and triton had diverged? We may be relaxing version constraints sometimes and we may not have included integration tests for torch.compile
because we've so far been just relying on the upstream's test suite. Speaking of which, @meditans, do you think you could package your reproducers into passthru.tests
for our torch
? Even if they require cuda (it's desirable to also have cpu-only tests because ofborg builds them, but it's ok to start small) you can experiment with:
Describe the bug
Trying to optimize a machine learning with
torch.compile
for a GPU creates an error becauseldconfig -p
is not found. Here's the precise error:Steps To Reproduce
On a computer with
pytorch
and an enabled NVidia GPU, try to run the following python program:Expected behavior
The program should complete successfully.
Additional context
This reminds me of another issue I raised on not having
ldconfig -p
in apytorch
environment, https://github.com/NixOS/nixpkgs/issues/285307Notify maintainers
@NixOS/cuda-maintainers @SomeoneSerge @Madouura
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.Add a :+1: reaction to issues you find important.