O-6: distarray global API -- Distributed Ufuncs and global methods

enthought / distarray

Default Repo description from terraform module

BSD 3-Clause "New" or "Revised" License

5 stars 1 forks source link

O-6: distarray global API -- Distributed Ufuncs and global methods #186

Closed kwmsmith closed 10 years ago

kwmsmith commented 10 years ago

Objective: This task will implement distributed versions of all NumPy ufuncs, and dis- tributed versions of the other non-ufunc NumPy functions.

Relevance: Ufuncs and array methods comprise the second major feature of NumPy, and without them, NumPy’s usefulness would be significantly depreciated. Distributed ufuncs that work with distributed ODIN arrays are a natural extension of NumPy’s existing ufunc capabilities.

Description: Given the regularity of unary and binary ufuncs, once a handful of dis- tributed ufuncs are developed, the rest will follow fairly quickly. What is more challenging are the combinatorial and statistical array functions, such as sort(), mean(), std(), and median(). These and other non-elemental distributed procedures will require more effort to implement. In our desire to not re-invent the wheel, we will make use of existing literature to implement these algorithms.

Some of these methods can be deferred to budget-year-2.

kwmsmith commented 10 years ago

@cowlicks @markkness -- I realize this milestone is somewhat vague. Could you take a survey of what in the NumPy API is currently implemented, what isn't, and categorize based on difficulty? We will push most of the non-trivial methods to year 2. Having an explicit tally of what's left to do here will be very helpful.

cowlicks commented 10 years ago

I think it is also necessary to consider the difficulty of covering specific features of certain numpy method. Like axis arguments in particular, and broadcasting.

kwmsmith commented 10 years ago

Broadcasting will be evaluated in a separate year-2 issue.

And yes, breaking things out if they're partially implemented and the difficulty of the remaining effort will be good.

Do you anticipate handling the axis arguments to be difficult? I'd anticipate things like mean to be fairly straightforward, but things like median and sort to be hard.

I'm sure we can find some distributed algos in the literature.

cowlicks commented 10 years ago

It should be easy if slicing is working. But I think slicing will be hard.

cowlicks commented 10 years ago

To get feature parity with NumPy's linalg routines we will have to incorporate Trilinos. I don't know the difficulty of incorporating Trilinos and what that will involve.

kwmsmith commented 10 years ago

Isn't slicing orthogonal to reduction operations, and reducing along a specified axis? Am I missing something?

An example:

da = distarray.ones((1e5, 1e5))
mean_1 = da.mean(axis=1) # trivial, since axis 1 is undistributed
mean_0 = da.mean(axis=0) # non-trivial; axis 0 is distributed

To compute mean_0, each LocalArray would compute its sum along its local 0th axis, and then there'd be an MPI reduction of all those local arrays, along with a division by the global size along the 0th dimension. No slicing involved.

kwmsmith commented 10 years ago

Yes, we will be offloading a lot of our linear algebra functionality to Trilinos. The scope of this milestone is just the ufuncs in the numpy namespace, the ndarray methods, and the numpy.random module.

cowlicks commented 10 years ago

Oh! You are correct. That is much smarter than what I wanted to do.

kwmsmith commented 10 years ago

@cowlicks -- I imagine this is waiting on the work for proper not-distributed arrays, correct?

I'd still like us to have a survey of all numpy methods / functions, and which are covered, and which have yet to be implemented. We can make a determination of which methods / functions are not "core" and out of scope for this phase of distarray work.

cowlicks commented 10 years ago

@kwmsmith see this gist I made last week. You can modify it if you like.

We should probably separate it into year 1 and year 2 stuff.

cowlicks commented 10 years ago

Oops forgot to add the gist: https://gist.github.com/cowlicks/9489089

kwmsmith commented 10 years ago

@cowlicks this is great -- thanks. Will look through in detail.

cowlicks commented 10 years ago

There is now a page on the wiki discussing this here

kwmsmith commented 10 years ago

I believe this task is closeable for the year 1 objectives, and we will address the year 2 objectives in milestone 0.3.

I'm changing the milestone to 0.3 to reflect this.

kwmsmith commented 10 years ago

See issue #319, which separates out the year-2 specific parts of this task.

Closing this task for year 1.