NVIDIA-Genomics-Research / rapids-single-cell-examples

Examples of single-cell genomic analysis accelerated with RAPIDS
Apache License 2.0
318 stars 68 forks source link

Cannot open UCX library: (null) in PCA analysis of 1M_brain_gpu_analysis_multigpu #103

Open ChenPeizhan opened 1 year ago

ChenPeizhan commented 1 year ago

Dear authors, Thank you for share the valuable pipelines to use GPU in single cell genomics data analysis. When I go through the 1M_brain_gpu_analysis_multigpu pipeline you have provided, I meet a problem in the step of PCA analysis of dask_sparse_arr data。 Please check the below.

%%time from cuml.dask.decomposition import PCA pca_data = PCA(n_components=50).fit_transform(dask_sparse_arr) pca_data.compute_chunk_sizes()

output

2022-11-12 16:17:05,992 - distributed.worker - WARNING - Run Failed Function: _func_init_all args: (b"'{\x19\xf3'\x1dB\x81\x93\xaeya!\x86q6", b'\x02\x00\xb1\x0c\xac\x15\x1d:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', False, {'tcp://127.0.0.1:33644': {'rank': 0}, 'tcp://127.0.0.1:42241': {'rank': 1}}, False, 0) kwargs: {} Traceback (most recent call last): File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/worker.py", line 3160, in run result = await function(*args, **kwargs) File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 459, in _func_init_all _func_build_handle(sessionId, streams_per_handle, verbose) File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 559, in _func_build_handle inject_comms_on_handle_coll_only( File "comms_utils.pyx", line 264, in raft_dask.common.comms_utils.inject_comms_on_handle_coll_only RuntimeError: exception occured! file=/project/cpp/include/raft/comms/detail/ucp_helper.hpp line=124: Cannot open UCX library: (null)

2022-11-12 16:17:05,996 - distributed.worker - WARNING - Run Failed Function: _func_init_all args: (b"'{\x19\xf3'\x1dB\x81\x93\xaeya!\x86q6", b'\x02\x00\xb1\x0c\xac\x15\x1d:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', False, {'tcp://127.0.0.1:33644': {'rank': 0}, 'tcp://127.0.0.1:42241': {'rank': 1}}, False, 0) kwargs: {} Traceback (most recent call last): File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/worker.py", line 3160, in run result = await function(*args, **kwargs) File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 459, in _func_init_all _func_build_handle(sessionId, streams_per_handle, verbose) File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 559, in _func_build_handle inject_comms_on_handle_coll_only( File "comms_utils.pyx", line 264, in raft_dask.common.comms_utils.inject_comms_on_handle_coll_only RuntimeError: exception occured! file=/project/cpp/include/raft/comms/detail/ucp_helper.hpp line=124: Cannot open UCX library: (null)


RuntimeError Traceback (most recent call last) File :2

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/cuml/dask/decomposition/pca.py:177, in PCA.fit_transform(self, X) 165 def fit_transform(self, X): 166 """ 167 Fit the model with X and apply the dimensionality reduction on X. 168 (...) 175 X_new : dask cuDF 176 """ --> 177 return self.fit(X).transform(X)

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/cuml/dask/decomposition/pca.py:162, in PCA.fit(self, X) 153 def fit(self, X): 154 """ 155 Fit the model with X. 156 (...) 159 X : dask cuDF input 160 """ --> 162 self._fit(X) 163 return self

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/cuml/dask/decomposition/base.py:71, in DecompositionSyncFitMixin._fit(self, X, _transform) 68 else: 69 comms = Comms(comms_p2p=False) ---> 71 comms.init(workers=data.workers) 73 data.calculate_parts_to_sizes(comms) 75 worker_info = comms.worker_info(comms.worker_addresses)

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py:200, in Comms.init(self, workers) 196 worker_info = {w: worker_info[w] for w in self.worker_addresses} 198 self.create_nccl_uniqueid() --> 200 self.client.run( 201 _func_init_all, 202 self.sessionId, 203 self.uniqueId, 204 self.comms_p2p, 205 worker_info, 206 self.verbose, 207 self.streams_per_handle, 208 workers=self.worker_addresses, 209 wait=True, 210 ) 212 self.nccl_initialized = True 214 if self.comms_p2p:

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/client.py:2836, in Client.run(self, function, workers, wait, nanny, on_error, args, kwargs) 2753 def run( 2754 self, 2755 function, (...) 2761 kwargs, 2762 ): 2763 """ 2764 Run a function on all workers outside of task scheduling system 2765 (...) 2834 >>> c.run(print_state, wait=False) # doctest: +SKIP 2835 """ -> 2836 return self.sync( 2837 self._run, 2838 function, 2839 args, 2840 workers=workers, 2841 wait=wait, 2842 nanny=nanny, 2843 on_error=on_error, 2844 **kwargs, 2845 )

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/utils.py:339, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, *kwargs) 337 return future 338 else: --> 339 return sync( 340 self.loop, func, args, callback_timeout=callback_timeout, **kwargs 341 )

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/utils.py:406, in sync(loop, func, callback_timeout, *args, **kwargs) 404 if error: 405 typ, exc, tb = error --> 406 raise exc.with_traceback(tb) 407 else: 408 return result

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/utils.py:379, in sync..f() 377 future = asyncio.wait_for(future, callback_timeout) 378 future = asyncio.ensure_future(future) --> 379 result = yield future 380 except Exception: 381 error = sys.exc_info()

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/tornado/gen.py:769, in Runner.run(self) 766 exc_info = None 768 try: --> 769 value = future.result() 770 except Exception: 771 exc_info = sys.exc_info()

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/client.py:2741, in Client._run(self, function, nanny, workers, wait, on_error, *args, **kwargs) 2738 continue 2740 if on_error == "raise": -> 2741 raise exc 2742 elif on_error == "return": 2743 results[key] = exc

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py:459, in _func_init_all() 456 worker.log_event(topic="info", msg="Done building handle.") 458 else: --> 459 _func_build_handle(sessionId, streams_per_handle, verbose)

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py:559, in _func_build_handle() 556 nWorkers = raft_comm_state["nworkers"] 558 nccl_comm = raft_comm_state["nccl"] --> 559 inject_comms_on_handle_coll_only( 560 handle, nccl_comm, nWorkers, workerId, verbose 561 ) 562 raft_comm_state["handle"] = handle

File comms_utils.pyx:264, in raft_dask.common.comms_utils.inject_comms_on_handle_coll_only()

RuntimeError: exception occured! file=/project/cpp/include/raft/comms/detail/ucp_helper.hpp line=124: Cannot open UCX library: (null)

I use the python=3.9 and install the packages using pip install function.

thank you very much!

beckernick commented 1 year ago

Are you running a multi-GPU workload? RAPIDS pip packages don't yet support multi-GPU (though conda packages / docker containers do).

ChenPeizhan commented 1 year ago

Yes, I running with multi-GPU workload. I will try it, many thanks. Do you know how to choose a specific GPU in this step? As I did not get any problem in the previous steps with multi-GPU. Many thanks for your response.

beckernick commented 1 year ago

The dask-cuda documentation may be useful here. But, I'd recommend using conda or Docker for now.