data-apis / scipy-2023-presentation

Materials for the SciPy 2023 Data APIs Consortium presentation and proceedings paper
MIT License
3 stars 11 forks source link

Outline #1

Closed asmeurer closed 1 year ago

asmeurer commented 1 year ago

Here's the outline from the talk proposal (I've also uploaded it here https://github.com/data-apis/scipy-2023-presentation/blob/main/outline.md)

So the first question is if there's anything that we should add for the paper.

asmeurer commented 1 year ago

One question is how much of the talk and/or paper should be spent discussing the dataframe work. The most significant work that's been done so far is on the array side, but I want to make sure that dataframes also get a fair share of the discussion.

asmeurer commented 1 year ago

By the way, the paper is limited to 8 pages (not including references). Here is what the example paper looks like to give an idea of how long that would be

image
rgommers commented 1 year ago

One question is how much of the talk and/or paper should be spent discussing the dataframe work. The most significant work that's been done so far is on the array side, but I want to make sure that dataframes also get a fair share of the discussion.

I added this to the agenda for tomorrow's dataframe call - let's see what everyone there thinks.

asmeurer commented 1 year ago

I've fleshed out the outline here https://github.com/data-apis/scipy-2023-presentation/blob/main/outline.md. I plan to use that as the basis for the paper. Let me know if you have any thoughts or suggestions.

rgommers commented 1 year ago

That looks pretty good! For the paper I think that indeed you want to focus on what has already been done and is complete. And in the presentation, spend a bit more time on dataframes, and what's next for SciPy, scikit-learn & beyond, and how people can help or adopt.

leofang commented 1 year ago

Thanks for leading the effort, Aaron! The outline looks good to me.

Two minor nits just FYI 🙂

Execution semantics are out of scope. This includes single-threaded vs. parallel execution, task scheduling and synchronization, eager vs. delayed evaluation, performance characteristics of a particular implementation of the standard, and other such topics.

I am sure Dask/cuNumerics would argue that __dlpack__ assumes the array fits in single node and not distributed. We don't yet have any zero-copy exchange protocol for distributed arrays. Given the page limit we probably don't want to mention such details, so just raising it here for the record.

Standardization of these dtypes is out of scope: bfloat16, complex, extended precision floating point, datetime, string, object and void dtypes.

"complex" shouldn't be listed as out-of-scope.

I agree with Ralf on focusing on what we already have done. For future work, we should avoid promising any plan that could have slight uncertainty to be abandoned. We can just vaguely say that we want to listen to the community feedback and include increase the API surface as needed, etc.

Also, when mentioning API compliance, we probably want to go with an inclusive tone. Some libraries like Dask cannot do __dlpack__ by nature, for example, but we still want to encourage libraries to adopt as much as they can, without worrying about looking bad compared to other libraries. It shouldn't be framed as a competition, at least not on our paper.

asmeurer commented 1 year ago

I copied some of these things from https://data-apis.org/array-api/latest/purpose_and_scope.html. We should go through that page and update it as some of the things written there are out of date.

Also, when mentioning API compliance, we probably want to go with an inclusive tone. Some libraries like Dask cannot do dlpack by nature, for example, but we still want to encourage libraries to adopt as much as they can, without worrying about looking bad compared to other libraries. It shouldn't be framed as a competition, at least not on our paper.

I can add a sentence or two about this. There's also pytorch which doesn't have unsigned integer types other than uint8. Hopefully we can also discuss Athan's compliance tracking site and mention this there.