amazon-science / unconditional-time-series-diffusion

Official PyTorch implementation of TSDiff models presented in the NeurIPS 2023 paper "Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting"
Apache License 2.0
101 stars 21 forks source link

OSError: /home/xxx/.cache/keops2.1.1/build/nvrtc_jit.so: cannot open shared object file: No such file or directory #10

Open hanlaoshi opened 1 month ago

hanlaoshi commented 1 month ago

Hi, there, how should I resolve this issue below?

Traceback (most recent call last): File "", line 1, in File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/pykeops/numpy/test_install.py", line 20, in test_numpy_bindings if np.allclose(my_conv(x, y).flatten(), expected_res): File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/pykeops/numpy/generic/generic_red.py", line 303, in call self.myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"]( File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 68, in call obj = self.cls(args) File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 15, in init super().init(args, fast_init=fast_init) File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 18, in init self.init(args) File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 126, in init ) = get_keops_dll( File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 27, in call self.library[str_id] = self.fun(args) File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/keopscore/get_keops_dll.py", line 110, in get_keops_dll_impl map_reduce_obj = map_reduce_class(red_formula_string, aliases, *args) File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/keopscore/mapreduce/gpu/GpuReduc1D.py", line 17, in init Gpu_link_compile.init(self) File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/site-packages/keopscore/binders/nvrtc/Gpu_link_compile.py", line 54, in init self.my_c_dll = CDLL(jit_compile_dll(), mode=RTLD_LAZY) File "/home/newdisk/ai/anaconda3/envs/tsdiff/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: /home/newdisk/ai/.cache/keops2.1.1/build/nvrtc_jit.so: cannot open shared object file: No such file or directory

marcelkollovieh commented 1 month ago

Hi, This looks like a cuda issue. Can you delete the cache and try again?

rm -rf /home/newdisk/ai/.cache/keops*

Can you also check whether nvcc is available?

nvcc -V

hanlaoshi commented 1 month ago

Hi, This looks like a cuda issue. Can you delete the cache and try again?

rm -rf /home/newdisk/ai/.cache/keops*

Can you also check whether nvcc is available?

nvcc -V

Hello! I've followed your suggestion to clear the cache and checked the output of 'nvcc -V', but the issue persisted.

(tsdiff) nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Therefore, I uninstalled pykeops, and that resolved the problem. So, does pykeops affect the results of the tsdiff model?

Currently, I am trying to apply tsdiff to multivariate time series for testing. Using GluonTS, the training process proceeds without issues, but I encounter problems during the evaluation phase. For example, with the solar_nips dataset, the following line of code causes an issue:

forecasts = list(tqdm(forecast_it, total=len(transformed_testdata)))

The problem arises due to a shape mismatch: data["future_target"] has the shape torch.Size([64, 24]), while scaled has the shape torch.Size([64, 1, 137]). During debugging, I found that in the training phase, the shape of data["future_target"] is torch.Size([64, 24, 137]), which matches the shape of scaled. Could you advise on how to modify the code so that tsdiff can be adapted to a multivariate series environment?