CannyLab / tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings
BSD 3-Clause "New" or "Revised" License
1.8k stars 129 forks source link

Installation fails for cuda11.1 because of Faiss #107

Closed elias-ramzi closed 1 year ago

elias-ramzi commented 3 years ago

Hi there,

Thank you for this great repo !

I have some troubles installing tsne-cuda. My cuda version is 11.1 so I ran the following command :

pip3 install tsnecuda==3.0.0+cu111 -f https://tsnecuda.isx.ai/tsnecuda_stable.html

And got this error :

ERROR: Could not find a version that satisfies the requirement faiss==1.6.5 (from tsnecuda==3.0.0+cu111) (from versions: none)
ERROR: No matching distribution found for faiss==1.6.5 (from tsnecuda==3.0.0+cu111)

I think in order to install faiss with pip you have to choose between faiss-cpu and faiss-gpu and not directly faiss (not sure though).

Thank you for any help !

Cosmos-Break commented 3 years ago

same question

UtkarshKunwar commented 3 years ago

After struggling for literally hours trying to build both FAISS and tSNE-CUDA with MKL and what not, what actually worked for me for Python3.8 in a simple virtualenv was:

I could run the examples for MNIST (6.5s) and CIFAR10 (28.5s) with this at CUDA 11.2. Also works for faiss-gpu==1.7.1. Have yet to test this fully on my own data though.

elias-ramzi commented 3 years ago

Thanks for the response !

I know have the following error :

OSError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory

Thanks :)

DavidMChan commented 3 years ago

This issue would occur when you don't have intel's MKL library linked correctly in your system (on your dynamic linker path, LD_LIBRARY_PATH var, etc). If this file is on your system, you can add the path to the LD_LIBRARY_PATH when running the code, or if it's not on your system, you need to install intel MKL.

We use the 1.6.5 version of FAISS since FAISS > 1.6 has issues with large numbers of points (more than 20k) - see #98

I'll look into updating the requirements.txt file for the python library to request faiss-gpu instead of faiss, and see if that fixes the problem. Until then, using --no-deps should be good enough.

elias-ramzi commented 3 years ago

Thanks for the response !

I am waiting on the administrator to install MKL on the servers, I will give an update :)

elias-ramzi commented 3 years ago

Hi,

This how I tried to use your repo :

pip3 install tsnecuda==3.0.0+cu111 -f https://tsnecuda.isx.ai/tsnecuda_stable.html --no-deps
pip install [deps]
git clone https://github.com/samuelhei/mkl-so-files
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/mkl-so-files

Then I ran :

from tsnecuda import TSNE
import numpy as np
X = np.random.randn(100, 5)
X_embedded = TSNE(n_components=2, perplexity=15, learning_rate=10).fit_transform(X)

And got the following error:

OSError: /users/r/ramzie/dev/NDCG/.venv/lib/python3.8/site-packages/tsnecuda/libtsnecuda.so: undefined symbol: sorgqr_

Do you have any idea how I could fix it ? Is it because I use the github repo for the MKL files ?

Many thanks !

iNLyze commented 3 years ago

I was successful at building Intel MKL and FAISS from source. However, when building tsne it gets stuck at [ 94%] Building CUDA object CMakeFiles/tsne.dir/src/exe/main.cu.o.
It gets stuck at the point of building python source files

[ 94%] Built target python_source_files
CMake Error: Error processing file: /tmp/tsne-cuda/../cmake/write_python_version_string.cmake
make[2]: *** [CMakeFiles/write_version_string_to_python.dir/build.make:70: write_version_string_to_python] Error 1

I think is trying to write the version string to __init__.py, but the respective env vars are empty. I checked write_python_version_string.cmake which is

# Write the python version string to __init__.py
#set(PYTHON_VERSION "\n\n__version__ = '${VERSION_STRING}.dev${BUILD_NUMBER}'\n")
set(PYTHON_VERSION "\n\n__version__ = '${VERSION_STRING}'\n")
file(APPEND "${CMAKE_CURRENT_BINARY_DIR}/python/tsnecuda/__init__.py" ${PYTHON_VERSION})

Does anyone know where the values for those env vars should come from? Did anyone experience something similar?

DavidMChan commented 1 year ago

Most of the build issues with newer CUDA versions should be resolved with the conda-forge build: https://github.com/conda-forge/tsnecuda-feedstock. For reference later - the VERSION_STRING is copied from the src/python/version.txt file (but these lines could be removed if there are build issues)