Closed kwmsmith closed 10 years ago
@cowlicks @markkness -- I realize this milestone is somewhat vague. Could you take a survey of what in the NumPy API is currently implemented, what isn't, and categorize based on difficulty? We will push most of the non-trivial methods to year 2. Having an explicit tally of what's left to do here will be very helpful.
I think it is also necessary to consider the difficulty of covering specific features of certain numpy method. Like axis arguments in particular, and broadcasting.
Broadcasting will be evaluated in a separate year-2 issue.
And yes, breaking things out if they're partially implemented and the difficulty of the remaining effort will be good.
Do you anticipate handling the axis
arguments to be difficult? I'd anticipate things like mean
to be fairly straightforward, but things like median
and sort
to be hard.
I'm sure we can find some distributed algos in the literature.
It should be easy if slicing is working. But I think slicing will be hard.
To get feature parity with NumPy's linalg routines we will have to incorporate Trilinos. I don't know the difficulty of incorporating Trilinos and what that will involve.
Isn't slicing orthogonal to reduction operations, and reducing along a specified axis? Am I missing something?
An example:
da = distarray.ones((1e5, 1e5))
mean_1 = da.mean(axis=1) # trivial, since axis 1 is undistributed
mean_0 = da.mean(axis=0) # non-trivial; axis 0 is distributed
To compute mean_0
, each LocalArray
would compute its sum
along its local 0th axis, and then there'd be an MPI reduction of all those local arrays, along with a division by the global size along the 0th dimension. No slicing involved.
Yes, we will be offloading a lot of our linear algebra functionality to Trilinos. The scope of this milestone is just the ufuncs in the numpy
namespace, the ndarray
methods, and the numpy.random
module.
Oh! You are correct. That is much smarter than what I wanted to do.
@cowlicks -- I imagine this is waiting on the work for proper not-distributed arrays, correct?
I'd still like us to have a survey of all numpy methods / functions, and which are covered, and which have yet to be implemented. We can make a determination of which methods / functions are not "core" and out of scope for this phase of distarray work.
@kwmsmith see this gist I made last week. You can modify it if you like.
We should probably separate it into year 1 and year 2 stuff.
Oops forgot to add the gist: https://gist.github.com/cowlicks/9489089
@cowlicks this is great -- thanks. Will look through in detail.
I believe this task is closeable for the year 1 objectives, and we will address the year 2 objectives in milestone 0.3.
I'm changing the milestone to 0.3 to reflect this.
See issue #319, which separates out the year-2 specific parts of this task.
Closing this task for year 1.
Objective: This task will implement distributed versions of all NumPy ufuncs, and dis- tributed versions of the other non-ufunc NumPy functions.
Relevance: Ufuncs and array methods comprise the second major feature of NumPy, and without them, NumPy’s usefulness would be significantly depreciated. Distributed ufuncs that work with distributed ODIN arrays are a natural extension of NumPy’s existing ufunc capabilities.
Description: Given the regularity of unary and binary ufuncs, once a handful of dis- tributed ufuncs are developed, the rest will follow fairly quickly. What is more challenging are the combinatorial and statistical array functions, such as
sort()
,mean()
,std()
, andmedian()
. These and other non-elemental distributed procedures will require more effort to implement. In our desire to not re-invent the wheel, we will make use of existing literature to implement these algorithms.Some of these methods can be deferred to budget-year-2.