Closed aknvictor closed 3 months ago
Very interesting adaption, looking forward to it.
Hi @Viktour19 , thanks for your contribution!
First of all, I could not install culingam in my Windows environment with pip install culingam
(although I could install it in my Linux environment).
I attempted a manual installation using the procedure shown here, but it failed.
With various modifications I was able to install. Please see below what I did and use it to improve culingam.
The environment variable CUDA_HOME is specified as a path in Windows.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
Changed the header inclusion method as shown in the official site.
#include <nvtx3/nvToolsExt.h>
Added -D_USE_MATH_DEFINES
to extra_compile_args of CUDAExtension.
https://stackoverflow.com/questions/56319494/nvcc-compilation-errors-with-m-pi-and-or
Add the following path to library_dirs in CUDAExtension.
C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64
And, change to nvToolsExt64_1.lib instead of nvToolsExt.lib. https://github.com/pytorch/pytorch/issues/101135
Hi @Viktour19, This is the result of comparing the same data with the existing DirectLiNGAM and GPU versions. The GPU version has the wrong causal order estimated. Is it a problem with my environment?
import numpy as np
import pandas as pd
import graphviz
import lingam
from lingam.utils import make_dot
print([np.__version__, pd.__version__, graphviz.__version__, lingam.__version__])
np.random.seed(0)
['1.25.2', '2.2.0', '0.20', '1.8.3']
x2 = np.random.uniform(size=100000)
x0 = 3.0*x2 + np.random.uniform(size=100000)
x1 = 1.0*x0 + 6.0*x2 + np.random.uniform(size=100000)
X = pd.DataFrame(np.array([x0, x1, x2]).T ,columns=['x0', 'x1', 'x2'])
make_dot([[0.0, 0.0, 3.0], [1.0, 0.0, 6.0], [0.0, 0.0, 0.0]])
%%time
model = lingam.DirectLiNGAM()
model = model.fit(X)
CPU times: total: 156 ms Wall time: 169 ms
print('causal ordering:', model.causal_order_)
make_dot(model.adjacency_matrix_)
causal ordering: [2, 0, 1]
%%time
model = lingam.DirectLiNGAM(measure='pwling_fast')
model = model.fit(X)
CPU times: total: 141 ms Wall time: 205 ms
print('causal ordering:', model.causal_order_)
make_dot(model.adjacency_matrix_)
causal ordering: [0, 1, 2]
Thanks for documenting the Windows setup!
I couldn't reproduce the issue on mine. Here's the graph using the data provided:
Could you try running the example in DirectLiNGAM_fast.py? That includes an additional check that the compiler is available.
The output of the get_cuda_version
function is as follows:
CUDA Version found:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:42:34_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
The culingam installed by pip install is v0.0.7, but in the github repository it is v0.0.6. I am using v0.0.6 installed manually from github on Windows. Is this due to a different version of culingam?
I installed culingam v0.07 on Linux with pip and ran DirectLiNGAM_fast.py, but got an AssertionError on assert np.allclose(model.adjacency_matrix_, m)
I tried to run it with only culingam v0.0.7. I ran the following code in the Kaggle environment, but the causal order was incorrect.
!pip install culingam
Collecting culingam Downloading culingam-0.0.7.tar.gz (27 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from culingam) (1.26.4) Requirement already satisfied: tqdm in /opt/conda/lib/python3.10/site-packages (from culingam) (4.66.1) Building wheels for collected packages: culingam Building wheel for culingam (pyproject.toml) ... done Created wheel for culingam: filename=culingam-0.0.7-cp310-cp310-linux_x86_64.whl size=89289 sha256=b56e51c13260bece05ff0a9e4f17f81bc52f0c503ddb8bff87ddd669f0ab9eba Stored in directory: /root/.cache/pip/wheels/4d/90/ee/7192c3880f1d0903b6f0a50af63669c5b4f55107f44f120e78 Successfully built culingam Installing collected packages: culingam Successfully installed culingam-0.0.7
import numpy as np
import subprocess
# [[ 0. 0. 0. 2.99982982 0. 0. ]
# [ 2.99997222 0. 2.00008518 0. 0. 0. ]
# [ 0. 0. 0. 5.99981965 0. 0. ]
# [ 0. 0. 0. 0. 0. 0. ]
# [ 7.99857006 0. -0.99911522 0. 0. 0. ]
# [ 3.99974733 0. 0. 0. 0. 0. ]]
# [3, 0, 2, 5, 4, 1]
def get_cuda_version():
try:
nvcc_version = subprocess.check_output(["nvcc", "--version"]).decode('utf-8')
print("CUDA Version found:\n", nvcc_version)
return True
except Exception as e:
print("CUDA not found or nvcc not in PATH:", e)
return False
def main():
np.random.seed(42)
size = 100000
x3 = np.random.uniform(size=size)
x0 = 3.0*x3 + np.random.uniform(size=size)
x2 = 6.0*x3 + np.random.uniform(size=size)
x1 = 3.0*x0 + 2.0*x2 + np.random.uniform(size=size)
x5 = 4.0*x0 + np.random.uniform(size=size)
x4 = 8.0*x0 - 1.0*x2 + np.random.uniform(size=size)
X = np.array([x0, x1, x2, x3, x4, x5]).T
dlm = DirectLiNGAM(12)
dlm.fit(X, disable_tqdm=False)
np.set_printoptions(precision=3, suppress=True)
print(dlm._adjacency_matrix)
print(dlm.causal_order_)
# Check for cuda availability before importing CUDA-dependent packages
if get_cuda_version():
try:
from culingam.directlingam import DirectLiNGAM
main()
except ImportError as e:
print("Failed to import CUDA-dependent package:", e)
else:
print("CUDA is not available. Please ensure CUDA is installed and correctly configured.")
CUDA Version found: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0
100%|██████████| 6/6 [00:00<00:00, 17.03it/s] [[ 0. 0. 0. 0. 0. 0. ] [ 6.596 0. 0. 0. 0. 0. ] [-1.331 0.474 0. 0. 0. 0. ] [ 0.065 0. 0.131 0. 0. 0. ] [ 8. 0. -1. 0. 0. 0. ] [ 3.999 0. 0. 0. 0. 0. ]] [0, 1, 2, 3, 4, 5]
Thanks for your patience! it seems I needed to allow for a broader range of CUDA gpu compute capability. E.g the P100 on Kaggle is sm_60
. I've updated the package on PyPi and on Github. Let me know if that works.
@Viktour19 Thanks for responding! Both PyPI and GitHub worked fine! I'll check a little more to merge the code.
It would be great if you could support pip install culingam
to install on Windows as well!
@Viktour19 You said the GPU was 32 times faster than the CPU, what number of variables and sample size data did you use? I tried the following combinations and found no difference between CPU and GPU. Number of variables: {10, 20, 50, 100} Sample size: {1000, 2000, 5000}
I benchmarked with samples: [1k to 1m] and dim: [10 to 100].
Here's the wall clock time for GPU on my setup. Can you share yours? How does this compare with CPU time on your setup?
Ps: I'm working on getting a Windows machine to test on.
I fixed the number of variables to 100 based on the heatmap you showed me.
There was no difference when the sample size was less than 5000, but when the sample size was greater than that, the GPU was clearly faster!
Excellent!
I temporarily reverted because I found that the CI test did not pass and the docs build did not pass in an environment without culingam installed.
The error is due to the following code (direct_lingam.py):
from lingam_cuda import causal_order as causal_order_gpu
To avoid the error in the above code, we can install culingam. However, culingam cannot be installed without CUDA (and cannot pip install on Windows), which means that CUDA is required to use lingam.
Changed the import location and reverted again. https://github.com/cdt15/lingam/pull/133/commits/e64892b165823249db902fa3ca20edbde3ecd346
This PR includes the implementation drastically speed-up (up to 32x on consumer GPU) DirectLiNGAM and its variants e.g VarLiNGAM.
The details are to allow for an optional dependency: https://github.com/Viktour19/culingam which implements custom CUDA kernels for the pairwise likelihood ratio causal ordering method.
The implementation has been tested locally on an NVIDIA RTX 6000 on a Linux machine - but tests on other setups are needed.