CannyLab / tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings
BSD 3-Clause "New" or "Revised" License
1.78k stars 126 forks source link

Throws an OSError when initializing #119

Closed apoorvagnihotri closed 1 year ago

apoorvagnihotri commented 1 year ago
from tsnecuda import TSNE

args = {
    "n_components": 2,
    "perplexity": 30,
    "n_iter": 5000,
}
tsne = TSNE(**args)

Error below:

OSError                                   Traceback (most recent call last)
Cell In[31], line 8
      1 from tsnecuda import TSNE
      3 args = {
      4     "n_components": 2,
      5     "perplexity": 30,
      6     "n_iter": 5000,
      7 }
----> 8 tsne = TSNE(**args)

File ~/miniconda3/envs/p31/lib/python3.10/site-packages/tsnecuda/TSNE.py:139, in TSNE.__init__(self, n_components, perplexity, early_exaggeration, learning_rate, num_neighbors, force_magnify_iters, pre_momentum, post_momentum, theta, epssq, n_iter, n_iter_without_progress, min_grad_norm, perplexity_epsilon, metric, init, return_style, num_snapshots, verbose, random_seed, use_interactive, viz_timeout, viz_server, dump_points, dump_file, dump_interval, print_interval, device, magnitude_factor)
    137 # Build the hooks for the BH T-SNE library
    138 self._path = os.path.dirname(__file__)
--> 139 self._lib = N.ctypeslib.load_library(
    140     'libtsnecuda', self._path)  # Load the ctypes library
    142 # Hook the BH T-SNE function
    143 self._lib.pymodule_tsne.restype = None

File ~/miniconda3/envs/p31/lib/python3.10/site-packages/numpy/ctypeslib.py:158, in load_library(libname, loader_path)
    156 if os.path.exists(libpath):
    157     try:
--> 158         return ctypes.cdll[libpath]
    159     except OSError:
    160         ## defective lib file
...
--> 374     self._handle = _dlopen(self._name, mode)
    375 else:
    376     self._handle = handle

OSError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory

I am using the following hardware: OS - Manjaro 6.1 CPU - AMD 7900X GPU - NVIDIA RTX 4090 - CUDA 12.0

Please request me more information for debugging process.

DavidMChan commented 1 year ago

It looks like you don't have intel's MKL library linked into your environment. How did you install the code? If you installed from source, did you install MKL?

apoorvagnihotri commented 1 year ago

No, I haven't. I have previously read that MKL library doesn't play well with AMG CPUs (slower).

Additionally, other Intel libraries like sklearn-intelex tend to give incorrect results altogether on AMD systems.

https://github.com/scikit-learn/scikit-learn/discussions/23212

Given the problems of using Intel libraries on AMD CPUs, is there any alternative to using MKL as a dependency?

On Tue, Feb 21, 2023, 11:39 PM David Chan @.***> wrote:

It looks like you don't have intel's MKL library linked into your environment. How did you install the code? If you installed from source, did you install MKL?

— Reply to this email directly, view it on GitHub https://github.com/CannyLab/tsne-cuda/issues/119#issuecomment-1438902743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMSG2ORTXUCCRFC5U7VYQ3WYUAG7ANCNFSM6AAAAAAVDHSEKM . You are receiving this because you authored the thread.Message ID: @.***>

DavidMChan commented 1 year ago

All of the pre-built binaries are linked with MKL, however the code itself supports OpenBLAS and ATLAS. You should be able to build it from source following the instructions: https://github.com/CannyLab/tsne-cuda/wiki/Installation

You may also need to make modifications for the 4090/CUDA 12, given that it's not officially supported (we don't have a 4090 for testing). The if statement here will need to be adapted to add the compute architecture compute_89 and sm_89 for the 4090, and the options here will need to be adjusted for performance (the native setting will be fine, but will likely not make use of the device efficiently).

I'm not sure if there are any breaking changes for CUDA 12 (from 11), so you may run into this as well if any parts of the API were deprecated/removed.

silencio94 commented 4 months ago

In my case, I installed an older version of MKL in a virtual environment (because the machine is not mine). Then, I specified the MKL environment path. This solved the issue.

OS: Ubuntu 22.04.2 LTS tsnecuda installation: pip3 install tsnecuda==3.0.1+cu112 -f https://tsnecuda.isx.ai/tsnecuda_stable.html CUDA: 12.2

pip install mkl==2019.0
export LD_LIBRARY_PATH=/home/myname/miniforge3/envs/myenv/lib/

python tsne.py

In the Jupyter Notebook, specifying environment variables in a notebook cell can cause the same issue to reoccur. If you want to use plotting packages to interactively view results in the Jupyter environment, additional steps might be necessary (it might not be difficult).