Closed kmpaul closed 4 years ago
Thanks for sharing your thoughts, @kmpaul. I think your plans address some clear needs, and align with and compliment the goals of GeoCAT.
Thanks, @clyne! I'm glad to hear that this makes for a more clearly defined scope.
I do want to be clear (to the @NCAR/xdev team, especially) that we can and should still work with @NCAR/geocat when and where appropriate.
In light of our recent discussions with the GeoCAT team, I have been thinking more closely about what efforts are uniquely suited to Xdev, rather than to GeoCAT. This issue is to create a forum to discuss these strategic efforts as a group.
Campaigns
I still believe that there are two primary campaigns to which Xdev should direct attention:
The EOCB campaign is see as vital to build up both the number of users of the Python/Pangeo stack and to increase the number of scientist-developers who contribute back to the stack codebase (from reporting issues to providing necessary feedback and design input to writing actual code). Without a growing contributor base, these open source tools will always struggle to move forward to the solutions that we need them to be. We need these contributors to provide us input on what functionality needs improvement, how they do their analysis, etc.
The WORKFLOWS campaign is also vital to build out the functionality that the scientists need to do scalable data analysis. To some extent, this campaign is "obvious," but our recent discussions with the GeoCAT team indicate that the development of "operators" (to use Matt Long's word, which is any functional operation applied to the data during analysis) is an area that most clearly falls into the GeoCAT namespace, as does visualization capabilities. We should help with these efforts, as we have promised, but this means that the rest of the scientific data analysis workflow (as aluded to in the WORKFLOWS campaign spec) with the Pangeo stack falls more clearly into the Xdev namespace. And our efforts should reflect this.
New Projects
With all of this in mind, I think we need to scope out some new projects that clearly delineate Xdev-specific work. Some projects currently exist that clearly fall into the Xdev space (though GeoCAT collaboration is always welcome on them):
And then I can foresee other projects in the near future that take us into new territory that clearly delineates us from GeoCAT:
And there are probably many others, but this is just a start. Those having new ideas, please post them here. Those wanting to take any idea above and flesh it out with the Project Spec, please do so.
cc: @NCAR/xdev @NCAR/geocat
Original Post from Notes
Note the “Workflows” Campaign Spec in the xdev-projects repository. Within the context of the Workflows Campaign, “Analysis computations” (or “Operators” as Matt Long calls them) are operations performed on scientific data (possibly calculating new physical quantities) during the “data analysis” phase of the scientists Jupyter-based workflow. In light of GeoCAT’s new growing effort, it makes sense that these “Operators” should exist in the GeoCAT package. Xdev’s role in this should be to assist in identifying, porting, testing, and developing these Operators for the GeoCAT package. But the primary responsibility for maintaining these operators will be GeoCAT (even if we help). And that leaves Xdev with an opportunity to shift responsibility to other parts of the Workflow Campaign, including “Search & Discovery”, “Ingestion”, “Check-Pointing”, and “Publication”. To that end, I am “greenlighting” efforts to develop new JupyterLab extensions, dashboarding, prototyping, etc., that can assist scientists with their workflow, (i.e., addressing pain-points) rather than just help them perform “big data calculations.” **NOTE:** Training people in the Pangeo stack is still an Xdev campaign, so we will still help people figure out how best to do their analysis. However, I just don’t think we should be devoting as much of our time to building and maintaining the “analysis toolbox” packages. **Thoughts?**