MadryLab / trak

A fast, effective data attribution method for neural networks in PyTorch
https://trak.csail.mit.edu/
MIT License
169 stars 22 forks source link

Fast_jl undefined symbol #67

Closed Bas-2k closed 3 months ago

Bas-2k commented 4 months ago

Hello, I have been trying to use TRAK for CelebA dataset for an age classification task using ResNet-50. However, I am unable to use the fast version as I get the following error:

ERROR:TRAK:Could not use CudaProjector. Reason: /cis/home/bpal5/anaconda3v3/envs/TrakEnv_v2/lib/python3.10/site-packages/fast_jl.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops15sum_dim_IntList4callERKNS_6TensorEN3c1016OptionalArrayR efIlEEbNS5_8optionalINS5_10ScalarTypeEEE ERROR:TRAK:Defaulting to BasicProjector.

However, I think fast_jl is installed correctly:

trak.test_install(use_fast_jl=True) TRAK and fast_jl are installed correctly! import fast_jl Traceback (most recent call last): File "", line 1, in ImportError: /cis/home/bpal5/anaconda3v3/envs/TrakEnv_v2/lib/python3.10/site-packages/fast_jl.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops15sum_dim_IntList4callERKNS_6TensorEN3c1016OptionalArrayRefIlEEbNS5_8optionalINS5_10ScalarTypeEEE

Can you please help me fix the error? My environment has a torch version of '2.3.0+cu121'

tingwl0122 commented 4 months ago

I think there is some issue regarding this version of the torch. You can downgrade to 2.2.1 and try again.

kristian-georgiev commented 4 months ago

It seems like fast_jl is not installed correctly. I am not sure why trak.test_install(use_fast_jl=True) passes. My guess is that it's a gcc version problem.

simplelifetime commented 3 months ago

meet the same problem, neither pytorch 2.2.1 or 2.1.2 work

simplelifetime commented 3 months ago

updating gcc and g++ to 11.4 solve my problem