dattalab / keypoint-moseq

https://keypoint-moseq.readthedocs.io
Other
68 stars 28 forks source link

Could not load dynamic library 'libcudart.so.11.0' #39

Closed vickerse1 closed 1 year ago

vickerse1 commented 1 year ago

Hi,

When I install in conda for linux/GPU the environment doesn't show up. Then, when I install with pip for linux/GPU I get the following error in jupyter notebook when I try to run "import keypoint_moseq as kpms":


2023-05-08 15:05:56.736333: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-05-08 15:05:56.773764: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-05-08 15:05:56.776625: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

Do you have any suggestions?

Thanks,

Evan

vickerse1 commented 1 year ago

ah yes, then the kernel dies and restarts. thanks, evan

vickerse1 commented 1 year ago

...and, more details of the error appeared in the command line window:


2023-05-08 16:07:24.091515: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version 2023-05-08 16:07:24.191301: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

calebweinreb commented 1 year ago

Hmm seems like there are a couple things to sort out:

vickerse1 commented 1 year ago

Hi Caleb,

  1. No, I mean the environment doesn't show up with "conda info --envs"
  2. No, I've installed cuda before many times to get it to work with both DLC and pytorch....changing it is always a bit of a can of worms. I currently use version 11.3.58 with an NVIDIA rtx 3080 with 12 GB RAM.

I will need to get the gpu version for this to be useful in the long-run (part of why I'm doing this is I'm having trouble with batch runs in BSOiD)....but, in the meantime I've made progress with the cpu version...I've made it all the way to the kpms noise calibration step, and am now getting this error:

Loading sample frames: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 90/90 [00:02<00:00, 35.20it/s]


AttributeError Traceback (most recent call last) Cell In[6], line 1 ----> 1 kpms.noise_calibration(project_dir, coordinates, confidences, **config())

File /mnt/c/Users/McCormick Lab/Documents/GitHub/keypoint-moseq/keypoint_moseq/calibration.py:445, in noise_calibration(project_dir, coordinates, confidences, bodyparts, use_bodyparts, video_dir, video_extension, conf_pseudocount, verbose, kwargs) 440 sample_keys.extend(annotations.keys()) 442 sample_images = load_sampled_frames( 443 sample_keys, video_dir, video_extension=video_extension) --> 445 return _noise_calibration_widget( 446 project_dir, coordinates, confidences, sample_keys, 447 sample_images, annotations, bodyparts=bodyparts, kwargs)

File /mnt/c/Users/McCormick Lab/Documents/GitHub/keypoint-moseq/keypoint_moseq/calibration.py:193, in _noise_calibration_widget(project_dir, coordinates, confidences, sample_keys, sample_images, annotations, keypoint_colormap, bodyparts, skeleton, error_estimator, conf_threshold, **kwargs) 191 import holoviews as hv 192 import panel as pn --> 193 hv.extension('bokeh') 195 max_height = np.max([sample_images[k].shape[0] for k in sample_keys]) 196 max_width = np.max([sample_images[k].shape[1] for k in sample_keys])

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/pyviz_comms/init.py:64, in extension.new(cls, *args, *kwargs) 62 except Exception: 63 pass ---> 64 return param.ParameterizedFunction.new(cls, args, **kwargs)

File ~/anaconda3/envs/keypoint_moseqcpu/lib/python3.9/site-packages/param/parameterized.py:3658, in ParameterizedFunction.new(class, *args, *params) 3656 inst = class_.instance() 3657 inst.param._setname(class.name) -> 3658 return inst.call(args,**params)

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/holoviews/ipython/init.py:127, in notebook_extension.call(self, *args, **params) 124 import nbformat # noqa: F401 126 try: --> 127 from .archive import notebook_archive 128 holoviews.archive = notebook_archive 129 except AttributeError as e:

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/holoviews/ipython/archive.py:10 8 from IPython import version_info 9 from IPython.display import Javascript, display ---> 10 from .preprocessors import Substitute 12 # Import appropriate nbconvert machinery 13 if version_info[0] >= 4: 14 # Jupyter/IPython >=4.0

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/holoviews/ipython/preprocessors.py:7 1 """ 2 Prototype demo: 3 4 python holoviews/ipython/convert.py Conversion_Example.ipynb | python 5 """ 6 import ast ----> 7 from nbconvert.preprocessors import Preprocessor 10 def comment_out_magics(source): 11 """ 12 Utility used to make sure AST parser does not choke on unrecognized 13 magics. 14 """

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/nbconvert/init.py:3 1 """Utilities for converting notebooks to and from different formats.""" ----> 3 from . import filters, postprocessors, preprocessors, writers 4 from ._version import version, version_info # noqa 5 from .exporters import *

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/nbconvert/filters/init.py:8 6 from .highlight import 7 from .latex import ----> 8 from .markdown import 9 from .metadata import 10 from .pandoc import *

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/nbconvert/filters/markdown.py:13 10 import re 12 try: ---> 13 from .markdown_mistune import markdown2html_mistune 14 except ImportError as e: 15 # store in variable for Python 3 16 _mistune_import_error = e

File ~/anaconda3/envs/keypoint_moseq_cpu/lib/python3.9/site-packages/nbconvert/filters/markdown_mistune.py:37 33 class InvalidNotebook(Exception): 34 pass ---> 37 class MathBlockGrammar(mistune.BlockGrammar): 38 """This defines a single regex comprised of the different patterns that 39 identify math content spanning multiple lines. These are used by the 40 MathBlockLexer. 41 """ 43 multi_math_str = "|".join( 44 [r"^\$\$.?\$\$", r"^\\[.?\\]", r"^\begin{([a-z]*?)}(.?)\end{\1}"] 45 )

AttributeError: module 'mistune' has no attribute 'BlockGrammar'

.....

any idea with this one?

Thanks,

Evan


From: Caleb Weinreb @.> Sent: Tuesday, May 9, 2023 5:06 AM To: dattalab/keypoint-moseq @.> Cc: Evan Vickers @.>; Author @.> Subject: Re: [dattalab/keypoint-moseq] Could not load dynamic library 'libcudart.so.11.0' (Issue #39)

Hmm seems like there are a couple things to sort out:

python -m ipykernel install --user --name=keypoint_moseq

nvcc --version

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/dattalab/keypoint-moseq/issues/39*issuecomment-1540018102__;Iw!!C5qS4YX3!DzTvvUwlnYXCQhmjjjmLnh2A2iqoJOzlhNcf_UVxURXogd9hHU4BkymdBv39qmdFHdLIbIBaS_SgJFKIz692XtEs3Vmd$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJGCDZR7IVCQL2AT5IP7BXDXFIXNHANCNFSM6AAAAAAX23T54I__;!!C5qS4YX3!DzTvvUwlnYXCQhmjjjmLnh2A2iqoJOzlhNcf_UVxURXogd9hHU4BkymdBv39qmdFHdLIbIBaS_SgJFKIz692XlHP40Rb$. You are receiving this because you authored the thread.Message ID: @.***>

calebweinreb commented 1 year ago

CUDA issue

Some thoughts...

mistune issue

First I should note that the calibration step can safely be skipped. Second, it seems like pinning mistune to an earlier version might be a workaround? Maybe pip instal -U mistune==0.8.4?

vickerse1 commented 1 year ago

Hi Caleb,

OK good. The cpu version now works at least through the PCA step, although I get the following flags on the calibration step:


WARNING:param.OverlayPlot01356: Tool of type 'pan' could not be found and could not be activated by default. WARNING:param.OverlayPlot01356:Tool of type 'pan' could not be found and could not be activated by default. WARNING:param.OverlayPlot01356: Tool of type 'wheel_zoom' could not be found and could not be activated by default.

In terms of the cuda issue, here is the location of the cuda that I'm using:

/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3

....and I'm running jupyter lab out of linux subsystem for windows on Windows 10 (so actual Windows path is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3).

So, how would I enter this into the commands you gave me? I tried entering the path in linux format where you wrote 'CUDA_PATH' and I get a KeyError flag on the path, with no real explanation.

Thanks,

Evan


From: Caleb Weinreb @.> Sent: Tuesday, May 9, 2023 10:07 AM To: dattalab/keypoint-moseq @.> Cc: Evan Vickers @.>; Author @.> Subject: Re: [dattalab/keypoint-moseq] Could not load dynamic library 'libcudart.so.11.0' (Issue #39)

CUDA issue

Some thoughts...

import os cuda_path = os.environ['CUDA_PATH'] # or maybe cuda_path='/usr/local/cuda-11.0' os.environ['XLA_FLAGS'] = '--xla_gpu_cuda_data_dir='+cuda_path

mistune issue

First I should note that the calibration step can safely be skipped. Second, it seems like pinning mistune to an earlier versionhttps://urldefense.com/v3/__https://github.com/CrossNox/m2r2/issues/40*issuecomment-986249585__;Iw!!C5qS4YX3!BiY2I2ndZi0zESLtsRvlqOIvlgC4BJHZVqOzGGzKiTM9Woa4UFbbYpDYFGonqFCSrhIyFloBzWMiBpyIwxUaNb2WEKj9$ might be a workaround? Maybe pip instal -U mistune==0.8.4?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/dattalab/keypoint-moseq/issues/39*issuecomment-1540553171__;Iw!!C5qS4YX3!BiY2I2ndZi0zESLtsRvlqOIvlgC4BJHZVqOzGGzKiTM9Woa4UFbbYpDYFGonqFCSrhIyFloBzWMiBpyIwxUaNVEAOpOR$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJGCDZUBIKC242TQBIT56LLXFJ2WVANCNFSM6AAAAAAX23T54I__;!!C5qS4YX3!BiY2I2ndZi0zESLtsRvlqOIvlgC4BJHZVqOzGGzKiTM9Woa4UFbbYpDYFGonqFCSrhIyFloBzWMiBpyIwxUaNatXV958$. You are receiving this because you authored the thread.Message ID: @.***>

calebweinreb commented 1 year ago

So did pip instal -U mistune==0.8.4 solve the calibration issue?

Hmm I don't have experience with WSL. Is there a reason you can't just do everything using the Windows OS? We've gotten Windows+GPU working fine and I think a number of other users have as well.

Regarding CUDA+JAX+WSL, according to this stackoverflow post, they seemed to get things working using the nighly build of jax:

python3 -m pip install git+https://github.com/google/jax 
pip install jaxlib --pre -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_cuda_releases.html

Given that this is a WSL-specific issue, I'm going to sign off at this point, but please post if you figure out a solution in case others have this issue!

vickerse1 commented 1 year ago

Yes, pip install -U mistune==0.8.4 made the calibration work.

Hmm...sure, I can try the jax update thing and if that doesn't work I'll reinstall in Windows. I guess I wanted to use WSL because I recently got cuda working there for pytorch, and figured it would also work here... In any case, I'll let you know what the solution is when I figure it out.

Thanks for all of the help!

-Evan


From: Caleb Weinreb @.> Sent: Tuesday, May 9, 2023 11:35 AM To: dattalab/keypoint-moseq @.> Cc: Evan Vickers @.>; Author @.> Subject: Re: [dattalab/keypoint-moseq] Could not load dynamic library 'libcudart.so.11.0' (Issue #39)

So did pip instal -U mistune==0.8.4 solve the calibration issue?

Hmm I don't have experience with WSL. Is there a reason you can't just do everything using the Windows OS? We've gotten Windows+GPU working fine and I think a number of other users have as well.

Regarding CUDA+JAX+WSL, according to this stackoverflow posthttps://urldefense.com/v3/__https://stackoverflow.com/questions/76030322/problems-with-installing-cuda-enabled-jax-on-a-wsl-ubuntu-virtual-machine__;!!C5qS4YX3!HBqjK4HQM1N75x9decTam1vBM0qHGLVcrezSDmEpsUm2HR53AI3DQN0pul0jIKrnHJrbwM7_feD0umxIqEbPo-Sr8FJF$, they seemed to get things working using the nighly build of jax:

python3 -m pip install git+https://github.com/google/jax pip install jaxlib --pre -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_cuda_releases.htmlhttps://urldefense.com/v3/__https://storage.googleapis.com/jax-releases/jaxlib_nightly_cuda_releases.html__;!!C5qS4YX3!HBqjK4HQM1N75x9decTam1vBM0qHGLVcrezSDmEpsUm2HR53AI3DQN0pul0jIKrnHJrbwM7_feD0umxIqEbPo3oPWth4$

Given that this is a WSL-specific issue, I'm going to sign off at this point, but please post if you figure out a solution in case others have this issue!

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/dattalab/keypoint-moseq/issues/39*issuecomment-1540686850__;Iw!!C5qS4YX3!HBqjK4HQM1N75x9decTam1vBM0qHGLVcrezSDmEpsUm2HR53AI3DQN0pul0jIKrnHJrbwM7_feD0umxIqEbPo_bRlowE$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJGCDZREWS5TRZWFJSG6P23XFKFARANCNFSM6AAAAAAX23T54I__;!!C5qS4YX3!HBqjK4HQM1N75x9decTam1vBM0qHGLVcrezSDmEpsUm2HR53AI3DQN0pul0jIKrnHJrbwM7_feD0umxIqEbPo2JR7wqP$. You are receiving this because you authored the thread.Message ID: @.***>

calebweinreb commented 1 year ago

@wingillis mentioned that pytorch packages its own private cuda/cudnn when installed on WSL, whereas JAX requires dynamic linking to the system-wide (Windows) install. So the success with pytorch may not translate.

vickerse1 commented 1 year ago

I installed the GPU version for Windows and the env setup and jupyter lab kernel worked. Then, I got some weird error on the first cell....something like kernel image not available. So, I exited and ran the jax update you suggested below in the environment, then reopened jupyter lab - and now it's working at least through loading DLC data.

-Evan


From: Caleb Weinreb @.> Sent: Tuesday, May 9, 2023 11:44 AM To: dattalab/keypoint-moseq @.> Cc: Evan Vickers @.>; Author @.> Subject: Re: [dattalab/keypoint-moseq] Could not load dynamic library 'libcudart.so.11.0' (Issue #39)

@wingillishttps://urldefense.com/v3/__https://github.com/wingillis__;!!C5qS4YX3!GiGqdaY6sWnI3VS9gUl70__HW-Glk7OcU8W23J87sh4WYZwHzPiNDDiw54yUEZp6gd7h1ZKwlcOZYZHtJwgvOsSa7baR$ mentioned that pytorch packages its own private cuda/cudnn when installed on WSL, whereas JAX requires dynamic linking to the system-wide (Windows) install. So the success with pytorch may not translate.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/dattalab/keypoint-moseq/issues/39*issuecomment-1540707376__;Iw!!C5qS4YX3!GiGqdaY6sWnI3VS9gUl70__HW-Glk7OcU8W23J87sh4WYZwHzPiNDDiw54yUEZp6gd7h1ZKwlcOZYZHtJwgvOrV2rlr7$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJGCDZSQWEKPCGS2YBUCFXTXFKGCFANCNFSM6AAAAAAX23T54I__;!!C5qS4YX3!GiGqdaY6sWnI3VS9gUl70__HW-Glk7OcU8W23J87sh4WYZwHzPiNDDiw54yUEZp6gd7h1ZKwlcOZYZHtJwgvOuhEf_Ky$. You are receiving this because you authored the thread.Message ID: @.***>

calebweinreb commented 1 year ago

Closing for now...