Closed aarmey closed 11 months ago
Hi Prof. Meyer, thank you so much for contributing to PaCMAP! This is indeed a very interesting and important observation. Before merging, I will run some tests on local machines and quantify the improvements. This merge request will be updated once profiling results are retrieved!
It seems like the latest commit on placing back numba compile for pacmap_grad()
may lead to error. I'm currently commenting it out and running some large scale experiments to check the performance.
Pasting the output below:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/users/hh219/FastPaCMAP/PaCMAP/source/pacmap/__init__.py", line 12, in <module>
from .pacmap_numpy import PaCMAP as PaCMAPNumpy
File "/home/users/hh219/FastPaCMAP/PaCMAP/source/pacmap/pacmap_numpy.py", line 233, in <module>
def pacmap_grad(Y, pair_neighbors, pair_MN, pair_FP, w_neighbors, w_MN, w_FP):
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/decorators.py", line 219, in wrapper
disp.compile(sig)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/dispatcher.py", line 965, in compile
cres = self._compiler.compile(args, return_type)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/dispatcher.py", line 129, in compile
raise retval
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/dispatcher.py", line 139, in _compile_cached
retval = self._compile_core(args, return_type)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/dispatcher.py", line 152, in _compile_core
cres = compiler.compile_extra(self.targetdescr.typing_context,
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler.py", line 716, in compile_extra
return pipeline.compile_extra(func)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler.py", line 452, in compile_extra
return self._compile_bytecode()
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler.py", line 520, in _compile_bytecode
return self._compile_core()
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler.py", line 499, in _compile_core
raise e
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler.py", line 486, in _compile_core
pm.run(self.state)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 368, in run
raise patched_exception
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 356, in run
self._runPass(idx, pass_inst, state)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 273, in check
mangled = func(compiler_state)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/typed_passes.py", line 105, in run_pass
typemap, return_type, calltypes, errs = type_inference_stage(
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/typed_passes.py", line 83, in type_inference_stage
errs = infer.propagate(raise_errors=raise_errors)
File "/home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1086, in propagate
raise errors[0]
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function getitem>) found for signature:
>>> getitem(array(float64, 1d, C), Tuple(slice<a:b>, none))
There are 22 candidate implementations:
- Of which 20 did not match due to:
Overload of function 'getitem': File: <numerous>: Line N/A.
With argument(s): '(array(float64, 1d, C), Tuple(slice<a:b>, none))':
No match.
- Of which 2 did not match due to:
Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 166.
With argument(s): '(array(float64, 1d, C), Tuple(slice<a:b>, none))':
Rejected as the implementation raised a specific error:
NumbaTypeError: unsupported array index type none in Tuple(slice<a:b>, none)
raised from /home/users/hh219/mambaforge/envs/fastviz/lib/python3.10/site-packages/numba/core/typing/arraydecl.py:72
During: typing of intrinsic-call at /home/users/hh219/FastPaCMAP/PaCMAP/source/pacmap/pacmap_numpy.py (244)
During: typing of static-get-item at /home/users/hh219/FastPaCMAP/PaCMAP/source/pacmap/pacmap_numpy.py (244)
File "../../../../../home/users/hh219/FastPaCMAP/PaCMAP/source/pacmap/pacmap_numpy.py", line 244:
def pacmap_grad(Y, pair_neighbors, pair_MN, pair_FP, w_neighbors, w_MN, w_FP):
<source elided>
grad[pair_neighbors[:, 0]] += w1[:, np.newaxis] * y_ij
In my experiments, the implementation with numpy will make the algorithm about ~3.5 times slower, on both small and large datasets. I guess there are some differences between the test benches we were using, so I'm posting some information on my end.
I'm using a machine with Intel Xeon Ice Lake Gold 5317 Processors, 3.0GHz 18MB Cache (64GB RAM - 12 cores). Here are the versions of the most important dependencies:
numba 0.56.0
numpy 1.22.4
cython 0.29.32
scikit-learn 1.1.2
Thanks for trying this out, @hyhuang00! I never actually timed my code, I was only looking at the profiling, and so I think I ended up with a misleading impression of the difference based on how the sampling data was collected. When I time my version vs. master my timing agrees with your observation.
Sorry for the trouble! I still think there should be a way to speed up this function since it only uses one CPU core. However, I'll more carefully benchmark my code before providing a new version.
No worries! Again, thank you so much for contributing to PaCMAP. I didn't notice that the numba compiled code was not using multiple cores as intended, and I think there will be ways to make this algorithm even faster.
NumPy is very good at vectorizing large operations, so that they are extremely high performance. Consequently, it is quite common for Python code to actually be faster than numba counterparts if expressions can be adjusted to avoid loops.
I noticed that the code spends most of its time in
pacmap_grad()
, and that this does not leverage multiple CPU cores. Switching this to vectorized Python code ends up much faster for large datasets.