Open kmaehashi opened 2 years ago
cc: @cjnolet @quasiben
This looks great, thanks @kmaehashi!
Miscellaneous routines (
scipy.misc
)
Note that we're quite likely deprecating this module in SciPy 1.9.0 (see https://github.com/scipy/scipy/issues/15608), so I suggest not working on that.
Thanks for the info, @rgommers! Updated the list.
@kmaehashi, RAFT contains hierarchical clustering code as well: https://github.com/cupy/cupy/issues/3434.
RAFT also contains functions for computing contingency table and sampling from various distributions. Wondering if those might be useful here as well.
Thanks @cjnolet! Added a link to the issue.
This is brilliant and interesting! I am Pranav, developer @lfortran, we are trying to compile scipy using LFortran, we are very close, almost got scipy.special.specfun
working. I wish to ask, if we (cupy contributors) are translating fortran implementation present in SciPy to make it use cupy
apis, right?
Hi @Pranavchiku! Yes, we are rewriting SciPy code into CUDA C/C++. You'll get the idea by exploring our code base under cupyx/scipy
:
https://github.com/cupy/cupy/tree/main/cupyx/scipy/special
Note that you will need a GPU environment to develop CuPy.
Do we need explicit GPU environment, or integrated GPUs work? By integrated I mean Apple GPU that is available with mac m1, m2, etc. chips.
Discrete GPUs are required, currently we support CUDA and HIP (for AMD). https://docs.cupy.dev/en/latest/overview.html
Hey, I need non-batched BLAS functions for what I'm working on, so I will be working on implementing those in the near future. Just wondering if there are any thoughts on a best-practices for implementing these.
The main use case I have is roughly: I have a large array of scalar values (could be hundreds of millions). For each value, I need to generate a (potentially not small) matrix, solve it, and then use the result to compute a scalar output value. For batched code, I would need to store these matrices altogether (potentially hundreds of GB memory - not feasible), while for non-batched, dynamically generated matrices, the memory requirements would just be the scalar values.
Hello,
I'm quite new to the open-source community but I would like to contribute to this project.
Is anybody on the integration of scipy.constants
module ?
It seems possible to mark scipy.interpolate
, scipy.signal
and scipy.signal.windows
as :shipit: now?
Also, I wonder if the comparison table (super-useful BTW!) , https://docs.cupy.dev/en/stable/reference/comparison.html#scipy-cupy-apis is updated automatically on a release, or is it manual?
Would be great to remove SciPy deprecated items (signal.cmplx_sort, windows from scipy.signal namespace, linalg.tri{l,u} etc)
Hey all, I was always willing to contribute here and I have my first PR https://github.com/cupy/cupy/pull/8305 opened that ports scipy.stats.kstat
. I do not have access to GPUs hence have not tested locally, I followed boxcox_llf
and it was fun and easy :)
It seems possible to mark
scipy.interpolate
,scipy.signal
andscipy.signal.windows
as :shipit: now?
Good catch! Updated the list :shipit:
Also, I wonder if the comparison table (super-useful BTW!) , https://docs.cupy.dev/en/stable/reference/comparison.html#scipy-cupy-apis is updated automatically on a release, or is it manual?
latest
page is updated automatically for each commit merged into main branch. stable
page is updated automatically for each stable (v13.*) release.
Would be great to remove SciPy deprecated items (signal.cmplx_sort, windows from scipy.signal namespace, linalg.tri{l,u} etc)
The page is generated with this tool. Deprecated items are currently listed in footnotes
section to describe why we don't implement them.
https://github.com/cupy/cupy/blob/1aa70f0de8c78afcfa503837d9baee3d80af4bc8/docs/source/_comparison_generator.py#L206-L207
Hey all, I was always willing to contribute here and I have my first PR #8305 opened that ports
scipy.stats.kstat
. I do not have access to GPUs hence have not tested locally, I followedboxcox_llf
and it was fun and easy :)
Hi @Pranavchiku, thank you for your interest in contributing! We, however, would like to kindly ask all contributors to build and test the branch before submitting a pull request. I understand that there are situations where access to GPUs is difficult but in that case please consider using offerings such as Google Colab.
Thanks @kmaehashi A couple of comments, now that I looked at this tracker a bit more:
KDTree
and Delaunay
implemented, maybe scipy.spatial
is fair to be marked as :shipit:, too. Submodules (spatial.transform
and spatial.distance
might be a separate story, don't have an opinion)scipy.linalg.{blas, lapack}
are unlikely to be implemented at all, correct? scipy.linalg.cython_{blas,lapack}
. Maybe even add an "out of scope" type symbol, ❌ or some such?scipy.io
--- is there an appetite for ever adding functionality for reading from matlab to CuPy? I suppose using scipy.io followed by cupy.asarray
is the way, is it?scipy.cluster
. It might be nice to show these too here?Thanks for the suggestions @ev-br!
- With
KDTree
andDelaunay
implemented, maybescipy.spatial
is fair to be marked as :shipit:, too. Submodules (spatial.transform
andspatial.distance
might be a separate story, don't have an opinion)
Absolutely, marked :shipit:!
scipy.linalg.{blas, lapack}
are unlikely to be implemented at all, correct?
Can't really say they are unlikely at all, as we have non-public (undocumented) cupyx.lapack
module. I think we can say they're considered low-priority, though.
- even more so for
scipy.linalg.cython_{blas,lapack}
. Maybe even add an "out of scope" type symbol, ❌ or some such?
Sounds great, marked ❌.
scipy.io
--- is there an appetite for ever adding functionality for reading from matlab to CuPy? I suppose using scipy.io followed bycupy.asarray
is the way, is it?
I began to wonder if I could delete it from the list, as the audience is very limited compared to NumPy's I/O routines (e.g., cupy.load
which we implemented by wrapping numpy.load
).
- Several SciPy subpackages support CuPy array inputs via Array API, e.g.
scipy.cluster
. It might be nice to show these too here?
Nice, added a link to the Array API support docs!
@kmaehashi hi, first time contributor here. I've just implemented a vectorized version of scipy.optimize.bvls (bounded value least squares, called through scipy.optimize.leastsq with method='bvls') for an astronomical data pipeline. lsq_linear
solves the least squares problem
minimize 0.5 * ||A x - b||**2
subject to lb <= x <= ub
Where A has dimensions (i,j); b has dimension (i) and x has dimension (j). My implementation vectorizes this to solve N identical shape matrices by making A => (N,i,j); b => (N, i) and x => (N, j) where a speed gain is obtained for larger N of at least a few hundred.
I would love to contribute this to CuPy but wanted to check first since you don't seem to have a cupyx.scipy.optimize
module which is where I think it should go to mirror scipy. Note I have not implemented the trf
or lsmr
algorithms within lsq_linear
so these return currently a NotImplementedError
and only BVLS is implemented. Cheers!
There is some discussion about cupyx.scipy.optimize
in issue: https://github.com/cupy/cupy/issues/6112
So... CuPy may be the wrong solution for my needs, but I have an array floats of len ~ 500,000 with a variable number (0.00001% to 10%) of 'affected' vs. 'normal' samples (about 10k of them). I want to run the permutation test on this large array, and it's taking minutes for as few as 1k permutations... I get the feeling I should rather sub-sample the normals and then permute them, but how many sub-samples do I take, etc....
Long story short, I'd like to implement scipy.stats.permutation_test in CuPy! Since you said to ask here first... here I am ;-)
Perhaps I can get away with just using CuPy arrays and passing 'native' scipy.stats.permutation_test a cupy function as the callable statistic? e.g. cupy.ndarray.mean?
I guess I should try that first 😂
Several mentions of RAFT above. I can't figure out if that's useful for me or not.
Most SciPy stats functions are being made array API compatible, so CuPy does not need to re-implement them. There are a few that we might ask for CuPy to implement because they cannot be implemented in terms of array API operations and common special functions, but permutation_test
is probably not one of them.
Perhaps I can get away with just using CuPy arrays and passing 'native' scipy.stats.permutation_test a cupy function as the callable statistic
You'll need to wrap it so it returns a NumPy array, but yes - if computing the statistic function is the bottleneck, that will probably speed things up.
Perhaps I can get away with just using CuPy arrays and passing 'native' scipy.stats.permutation_test a cupy function as the callable statistic
You'll need to wrap it so it returns a NumPy array, but yes - if computing the statistic function is the bottleneck, that will probably speed things up.
Maybe I misunderstood what you mean't by 'wrap'... If I try to pass cupy arrays into permutation test, it immediately complains:
TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
Seems there is no speed improvement over numpy on simple 'wrapped' statistics:
def compare_ttest_permu(rp_scores_np: np.ndarray, rn_scores_np: np.ndarray) -> None:
for n_resamples in [10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000]:
print(f"PERM : {n_resamples}")
res = permutation_test(
(rp_scores_np, rn_scores_np),
statistic=mean_ind_cp,
permutation_type="independent",
n_resamples=n_resamples,
alternative="two-sided",
vectorized=True,
)
def mean_ind_cp(x: cp.ndarray, y: cp.ndarray, axis=0):
cp_x = cp.array(x)
cp_y = cp.array(y)
return cp.mean(cp_x, axis=axis) - cp.mean(cp_y, axis=axis)
Implement GPU version of
scipy.*
functions incupyx.scipy.*
namespace.This is a tracker issue that summarizes the implementation status of each SciPy public module in CuPy. See the comparison table for details.
Legends
List of Modules
scipy.cluster
)scipy.cluster.vq
) (#5947)scipy.cluster.hierarchy
) (#3434)scipy.constants
)scipy.fft
)scipy.fftpack
) (note: deprecated, no further development planned)scipy.integrate
) (#7019)scipy.interpolate
) (#7186)scipy.io
)scipy.io.arff
scipy.io.matlab
scipy.io.wavfile
scipy.linalg
)scipy.linalg.blas
)scipy.linalg.cython_blas
scipy.linalg.lapack
) c.f.cupyx.lapack
scipy.linalg.cython_lapack
scipy.linalg.interpolative
)scipy.ndimage
)scipy.odr
)scipy.optimize
) (#6112)scipy.signal
) (#7403)scipy.signal.windows
(#7404)scipy.sparse
)scipy.sparse.linalg
)scipy.sparse.csgraph
) (#2431)scipy.spatial
) (#5946)scipy.spatial.distance
) (#5946)scipy.spatial.transform
)scipy.special
) (note: planned to be backed by SciPy in future #8163)scipy.stats
)scipy.stats.contingency
) (note: RAFT could be used)scipy.stats.distributions
scipy.stats.mstats
) (note: masked arrays unsupported in CuPy)scipy.stats.qmc
)scipy.stats.sampling
) (note: RAFT could be used)Note: Several modules in SciPy such as
scipy.cluster.hierarchy
accepts CuPy ndarrays as inputs through the Array API standard.Starter Task
If you are new to CuPy and wondering where to start, here are some good first things to try. They are relatively simple and independent.
scipy.stats
)scipy.sparse.block_diag
(#7058)scipy.sparse.load_npz
scipy.sparse.save_npz
scipy.constants.*
(hint: most things except for methods can just be borrowed from SciPy)Steps to Contribute
Fork and star :star: the CuPy repository :wink:
Pick a function you want to work on from any of the modules listed above. You can find the function in the SciPy API Reference to understand what should be implemented. If you want to work on new modules (🥚), first discuss with us on this issue.
Implement a function in your branch. If you need help, join Gitter or just ask for help in this issue.
Don't forget to write a test code!
Build CuPy and run tests to confirm that the function runs fine:
pip install -e . && pytest tests/cupyx_tests/scipy_tests/PATH_TO_YOUR_TEST
See the Contribution Guide for details.Submit a pull-request to the
main
branch.Note that you will need a GPU environment to develop CuPy.
See also: NumPy API Tracker Issue #6078