ODM2 / ODM2PythonAPI

A set of Python functions that provides data read/write access to an ODM2 database by leveraging SQLAlchemy.
http://odm2.github.io/ODM2PythonAPI/
BSD 3-Clause "New" or "Revised" License
4 stars 13 forks source link

v0.7.2 release #165

Closed emiliom closed 5 years ago

emiliom commented 5 years ago

Preparing a new minor release. See https://github.com/ODM2/ODM2PythonAPI/milestone/3 for the aspirational targets, at this time. The last release happened a year ago.

The changes that I do intend to include, for sure, are two bug fixes:

Everything else is up for "debate". But since I'm the one volunteering to do the work, the debate is really about what's desirable/ok, not about whether it actually gets done in this release! I won't include anything we don't come to agreement on (other than the 2 bug fixes), but I make no promises to include anything we agree on :smirk_cat:

FYI, my initiative to issue a release is not driven by a specific need right now. Just a desire to push out those bug fixes, plus odm2api momentum I'm carrying over from the last two weeks at https://waterhackweek.github.io/ and a CZIMEA workshop where we talked a fair bit about ODM2, with participation from Tony Castronova and Miguel.

I'm pasting comments on goals for this release, from #164:

From @horsburgh:

I think we should be very careful about adding additional requirements and complexity. My feeling is that we never finished the core functionality and so adding additional functionality and dependencies should perhaps be secondary to firming up the foundation.

Utility functions would be nice. Is there ongoing work that's driving this?

From @aufdenkampe:

I agree with the points about managing complexity and need to better develop core functionality.

I also believe that -- given that Pandas has become a core part of the standard Python computational science and data science stack -- that we should consider strong integration with Pandas and GeoPandas as core functionality. This is especially true given that one of the highest priorities we've heard from users and potential users is to improve I/O performance (including data alignment and slicing), and that is one of the main purposes/advantages of using Pandas.

An alternative I've considered is creating a tiny, external, temporary utilities package (on a separate github repo that can be pip installed) as proof of concept, with the functionality I listed in #164. But "Import into a pandas DataFrame the output of an odm2api read function" is so easy, non-disruptive and desirable to users that I think it'd be helpful to include it in odm2api. FYI, here's what it amounts to, mostly (plus extra polish I plan to add), using getVariables output as an example:

variables_df = pd.DataFrame.from_records(
      [vars(rec) for rec in odm2read.getVariables()], index='VariableID'
)
emiliom commented 5 years ago

158 has been addressed in PR #169

emiliom commented 5 years ago

@aufdenkampe and @horsburgh after your burst of engagement on #164 two weeks ago, you went mum ... Anyway, I've been working away at this release, and updating my release milestone targets along the way. The two critical issues I absolutely wanted to get into this release (#156 and #158) are merged into the development branch, plus a bunch of improvements to the auto tests and some cleanup of old issues.

I will shoot for issuing a new release by Friday, containing at the very least what's already in the development branch. As for the other stuff in my milestone, we'll see.

Pinging @Castronova because one of my motivators to issue a new release is to give him the go-ahead to update odm2api on the HydroShare Jupyterhub. As we learned at Waterhackweek several weeks ago, the version being used there is ancient. But I told him to hold off on updating it until I issued a new release.

BTW, the functionality I'm proposing to add as "utiltiies" was motivated in part from watching @Castronova demo odm2api at the CZIMEA workshop a couple of weeks ago, and presenting odm2api query results "raw", w/o first ingesting them into a Pandas Dataframe. But it's also stuff I've been mulling over for a while, and/or have been using in my odm2api demo notebooks in one way or another (eg, http://odm2.github.io/ODM2PythonAPI/getstarted.html#sample-jupyter-notebooks)

aufdenkampe commented 5 years ago

@emiliom, thanks for moving these fixes along! I really appreciate that.

Regarding Pandas integration, I fully support the following from @emiliom.

But "Import into a pandas DataFrame the output of an odm2api read function" is so easy, non-disruptive and desirable to users that I think it'd be helpful to include it in odm2api.

Let's please include at least a lightweight Pandas integration in odm2api. I would suggest that we should have Pandas integration at the core of ODM2 software, and that we already have a confusing number of non-integrated repos. Let's consolidate rather than split, especially for such a fundamental package as Pandas (and GeoPandas).

Regarding #156 "Model Name Fixes, I would strongly urge us to fully complete and align odm2api with the name spelling fixes that have been found in the ODM2 repo. Jeff identified a number of spelling errors in Nov. 2016, but those fixes remained on the develop branch. I found other spelling errors this last summer. All of these spelling fixes are in ODM2 develop branch and PR https://github.com/ODM2/ODM2/pull/154 has been issued and awaiting approval and the issue of an ODM2.0.1 bug fix release. Can we also issue that release this week, in coordination with a new release of the ODM2PythonAPI?

horsburgh commented 5 years ago

@emiliom - I'm super busy with end of semester stuff right at the moment, but I don't really have issues with you moving forward with the ODM2API stuff that you have been working on. The Pandas dataframe idea is good. I'm a bit dubious about GeoPandas because it potentially takes us back in the direction of trying to support GIS functionality. We spun our wheels on that for a long time and it cost me a bunch of time and effort when my developers just couldn't sort it out. I think we should tread carefully there.

emiliom commented 5 years ago

Thanks for the votes on a lightweight Pandas integration. I'll say more about it, and where I'm at, in #164.

Regarding GeoPandas-related functionality: @horsburgh I agree it brings up other issues, so let's set it aside for future discussions and releases. I hear you on your past unhappy experience with geospatial functionality, but I'm looking forward to the opportunity to persuade you at least about GeoPandas :wink: -- just not in the next few weeks.

Regarding #156 "Model Name Fixes, I would strongly urge us to fully complete and align odm2api with the name spelling fixes that have been found in the ODM2 repo. Jeff identified a number of spelling errors in Nov. 2016, but those fixes remained on the develop branch. I found other spelling errors this last summer. All of these spelling fixes are in ODM2 develop branch and PR ODM2/ODM2#154 has been issued and awaiting approval and the issue of an ODM2.0.1 bug fix release. Can we also issue that release this week, in coordination with a new release of the ODM2PythonAPI?

I already commented in response to your comments at https://github.com/ODM2/ODM2PythonAPI/pull/156#issuecomment-485871592. I will also respond at https://github.com/ODM2/ODM2/pull/154#issuecomment-485852192 later today. The fixes definitely seem required, but I'm a bit leery about issuing a new ODM2 release w/o verifying that it doesn't impact odm2api (my hunch is it doesn't, but it'll require verification). I can't take on more work for this odm2api release. I'll repeat what I said when I opened this issue:

Everything else is up for "debate". But since I'm the one volunteering to do the work, the debate is really about what's desirable/ok, not about whether it actually gets done in this release! I won't include anything we don't come to agreement on (other than the 2 bug fixes), but I make no promises to include anything we agree on.

emiliom commented 5 years ago

FYI, I'm running out of time (I'll be traveling next week), and the dataframes task (#172) has developed some complexities that I'll need more time to address. So, I'll go ahead and issue a release by Monday, based on the current state of the development branch. I'll also make some small updates to the Sphinx documentation to reflect the changes.

emiliom commented 5 years ago

Forgot to mention that I'll shoot for sharing a notebook by mid next week to demo the current state of my Pandas dataframe ingester, and remaining challenges.

aufdenkampe commented 5 years ago

Thanks for the update, and for pushing this release forward in its current state. I'm a fan of frequent smaller releases, over bigger less frequent releases.

emiliom commented 5 years ago

Release v0.7.2 is now published. Finished and closed https://github.com/ODM2/ODM2PythonAPI/milestone/3. Will now work on creating conda-forge and PyPI packages.