florisvb / PyNumDiff

Methods for numerical differentiation of noisy data in python
Other
112 stars 16 forks source link

Duplicate package? #48

Open Jacob-Stevens-Haas opened 1 year ago

Jacob-Stevens-Haas commented 1 year ago

Hey, I've been working on pysindy and realized that the Kutz/Brunton lab is associated with two packages for "numerical differentiation of noisy time-series data", citing the same Chartrand paper. Currently, pysindy utilizes the derivative, which I've been contributing to occasionally over the past year. I wanted to ask what the authors (@florisvb @luckystarufo, @andgoldschmidt ) opinions are as to whether it's worth combining the packages, especially since these were both projects supervised under the same advisors? From the perspective of the python ecosystem, I think it would be better; less maintenance cost, and zen number 13

"There should be one-- and preferably only one --obvious way to do it."

Obviously everyone's welcome to their own packages, however.

At first blush, here's the comparison:

I'll be working in the pysindy & differentiation over the next (and hopefully last) year of my PhD and contributing code in a variety of repos. Obviously, both packages have MIT licenses. But I want to poll the authors' feelings on merging code and, long-term, deprecating one of the packages, especially if I begin to add issues like "hey I'm trying to copy your code, why does foo() call bar()?".

florisvb commented 1 year ago

Hi Jacob,

Thanks for initiating this conversation. I'm looping in Nathan and Steve in case they have any insight to offer.

A few clarifications:

I'm not really sure what a merging process would look like, but whatever happens, I think it is important that things remain backwards compatible so that people who are using one package currently don't get screwed over in the future. If there are backwards compatible modifications that can be made to pynumdiff I'm happy to take pull requests to make it more user/pysindy friendly.

---------- Forwarded message --------- From: Jacob Stevens-Haas @.> Date: Thu, Apr 13, 2023 at 3:47 PM Subject: [florisvb/PyNumDiff] Duplicate package (Issue #48) To: florisvb/PyNumDiff @.> CC: Floris van Breugel @.>, Mention < @.>

Hey, I've been working on pysindy https://github.com/dynamicslab/pysindy and realized that the Kutz/Brunton lab have two packages for "numerical differentiation of noisy time-series data", citing the same Chartrand paper. Currently, pysindy utilizes the derivative https://github.com/andgoldschmidt/derivative/, which I've been contributing to occasionally over the past year. I wanted to ask what the authors @.*** https://github.com/florisvb @luckystarufo https://github.com/luckystarufo, @andgoldschmidt https://github.com/andgoldschmidt ) opinions are as to whether it's worth combining the packages, especially since these were both projects supervised under the same advisors? From the perspective of the python ecosystem, I think it would be: less maintenance cost, and zen number 13 https://peps.python.org/pep-0020/

"There should be one-- and preferably only one --obvious way to do it."

Obviously everyone's welcome to their own packages, however.

At first blush, here's the comparison:

I'll be working in the pysindy & differentiation over the next (and hopefully last) year of my PhD and contributing code in a variety of repos. Obviously, both packages have MIT licenses. But I want to poll the authors' feelings on merging code and deprecating one of the packages, esp if I begin to add issues here like "hey I'm trying to copy the code, why does foo() call bar()?".

— Reply to this email directly, view it on GitHub https://github.com/florisvb/PyNumDiff/issues/48, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPAMXOADZU2SEDEEJQTXBB66LANCNFSM6AAAAAAW5V6CVM . You are receiving this because you were mentioned.Message ID: @.***>

Jacob-Stevens-Haas commented 1 year ago

Floris, thanks for that excellent paper and context! It does look like your implementations and algorithm coverage is as good/better than the ones at derivative.

For my PhD research, I proposed combining the derivative/smoothing optimization problem into the SINDy optimization problem as a single multi-objective optimization (sort of along the lines of the recommended work in your conclusion!). I started down that track by doing a lot of refactoring to improve how the derivatives are used in pysindy. When I began running experiments to compare different derivative methods in SINDy, Nathan suggested that I google other packages for the total variational derivative, as the results using pysindy/derivative's TV method weren't competitive with Kalman/Savitsky-Golay. Hence my surprise at finding that Nathan's also an author on the PyNumDiff paper.

Also, deprecation isn't really an official thing in python packages. As long as nobody specifically yanks releases, they remain on PyPI in perpetuity. Thus, anything that depends upon pynumdiff==0.5.3 would continue working in any deprecation scenario. Deprecation is just manually adding a note at the top of the README.md "This package has been deprecated. Continued support is maintained by this other package (GH link)"

I've never merged packages before, so I'm taking a stab, but here's my best guess:

  1. I update the pydindy.differntiation API in my PR over there (currently, any callable that returns derivatives works, but the issue being how to also return the smoothed coordinate values).
  2. I migrate the derivative wrapper from pysindy back to derivative and add similar wrappers here in PRs.
  3. I do some housekeeping PRs here to ensure dependency compatability with derivative (currently everything here is pinned. Also, looks like you're running Travis CI, but I don't see a link? Would you be OK switching to GitHub actions CI?)
  4. In the successor package, pull request changes that (a) import the other package (b) exposes everything imported from other package at the top package namespace (there don't appear to be name conflicts), and (c) copies all the tests from the other package, verifying they run.
  5. Build derivative-style and pynumdiff-style wrappers for each smoothing/method
  6. Copy the rest of the other package into the successor package, updating from pynumdiff import x to from .pynumdiff import x or from derivative import x to from .derivative import x. verify tests pass.
  7. Evaluate competing implementations with a benchmark test and update wrappers to point to better implementation, removing the lesser implementation.
  8. Mark the other package as deprecated in readme.md
    • caveat: I don't know how adding new methods works with the pareto optimization of in your paper (pynumdiff.optimize) to new methods, e.g. the Kernel smoothing that some folks in the Kutz/Brunton orbit are doing that I'm trying to add to derivative. That seems a neat & useful tuning method.

At a certain point, we'd have to decide, if a single package is the desire, which should be the successor package and which should be the deprecated pacakge, and all the admin that entails. We could also go for a system where derivative imports pynumdiff and we stop short of copying the code. Really, numbers 1,2, & 3 are all that I'd need to do for my research; the rest is more about improving the package ecosystem around SINDy and numerical differentiation.

-Jake

florisvb commented 1 year ago

Sorry for the delayed reply. I've stretched a little thing right now in terms fo time.

If I understand correctly you're suggesting to make pysindy compatible with both derivative and pynumdiff? I think that would be good in terms of making either package an option.

I don't have anything against using GitHub actions CI -- I am not super well versed in that aspect of the software/package management (luckstarufo implemented that part for us).

The pynumdiff.optimize is separate from the individual methods, so it is definitely possible to add methods to pynumdiff without adding them to optimize. To add the optimize option for a new method takes a little effort, but there are enough examples from the other methods that it shouldn't be too hard to implement that.

In terms of pysindy stuff, I think getting input/opinions from Nathan and Steve may be helpful?

I would love it if it made sense to use pynumdiff in pysindy, and would be happy to help with making changes, or integrating pull requests, to make that easier. I am also not opposed to deprecating certain methods to simplify the dependencies (I don't see a reason to keep the cheby method and its pychebfun around, for example, but it was useful to test it out).

On Mon, Apr 17, 2023 at 2:43 PM Jacob Stevens-Haas @.***> wrote:

Floris, thanks for that excellent paper and context! It does look like your implementations and algorithm coverage is as good/better than the ones at derivative.

For my PhD research, I proposed combining the derivative/smoothing optimization problem into the SINDy optimization problem as a single multi-objective optimization (sort of along the lines of the recommended work in your conclusion!). I started down that track by doing a lot of refactoring to improve how the derivatives are used in pysindy. When I began running experiments to compare different derivative methods in SINDy, Nathan suggested that I google other packages for the total variational derivative, as the results using pysindy/derivative's TV method weren't competitive with Kalman/Savitsky-Golay. Hence my surprise at finding that Nathan's also an author on the PyNumDiff paper.

Also, deprecation isn't really an official thing in python packages https://discuss.python.org/t/adding-a-mechanism-to-deprecate-a-published-project/13937. As long as nobody specifically yanks releases, they remain on PyPI in perpetuity. Thus, anything that depends upon pynumdiff==0.5.3 would continue working in any deprecation scenario. Deprecation is just manually adding a note at the top of the README.md "This package has been deprecated. Continued support is maintained by this other package (GH link)"

I've never merged packages before, so I'm taking a stab, but here's my best guess:

  1. I update the pydindy.differntiation API in my PR over there (currently, any callable that returns derivatives works, but the issue being how to also return the smoothed coordinate values).
  2. I migrate the derivative wrapper from pysindy back to derivative and add similar wrappers here in PRs.
  3. I do some housekeeping PRs here to ensure dependency compatability with derivative (currently everything here is pinned. Also, looks like you're running Travis CI, but I don't see a link? Would you be OK switching to GitHub actions CI?)
  4. In the successor package, pull request changes that (a) import the other package (b) exposes everything imported from other package at the top package namespace (there don't appear to be name conflicts), and (c) copies all the tests from the other package, verifying they run.
  5. Build derivative-style and pynumdiff-style wrappers for each smoothing/method
  6. Copy the rest of the other package into the successor package, updating from pynumdiff import x to from .pynumdiff import x or from derivative import x to from .derivative import x. verify tests pass.
  7. Evaluate competing implementations with a benchmark test and update wrappers to point to better implementation, removing the lesser implementation.
  8. Mark the other package as deprecated in readme.md

    • caveat: I don't know how adding new methods works with the pareto optimization of in your paper (pynumdiff.optimize https://github.com/florisvb/PyNumDiff/tree/master/pynumdiff/optimize) to new methods, e.g. the Kernel smoothing that some folks in the Kutz/Brunton orbit are doing that I'm trying to add to derivative. That seems a neat & useful tuning method.

At a certain point, we'd have to decide, if a single package is the desire, which should be the successor package and which should be the deprecated pacakge, and all the admin that entails. We could also go for a system where derivative imports pynumdiff and we stop short of copying the code. Really, numbers 1,2, & 3 are all that I'd need to do for my research; the rest is more about improving the package ecosystem around SINDy.

— Reply to this email directly, view it on GitHub https://github.com/florisvb/PyNumDiff/issues/48#issuecomment-1512121410, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPG56ZJ4YUVUBKNWJXDXBW2PJANCNFSM6AAAAAAW5V6CVM . You are receiving this because you were mentioned.Message ID: @.***>

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/

Jacob-Stevens-Haas commented 1 year ago

I agree on getting Nathan/Steve's input. I'm going to give a talk at the Brunton/Kutz group on May 8th around a variety of pysindy changes, including a discussion on what aspects of the API would be breaking changes. In the interim I'll try to PR some of item 3 above (CI and dependencies).

-Jake

florisvb commented 1 year ago

Updates:

I worked on the requirements.txt and some of the test functions. I am now able to smoothly install everything locally using the requirements.txt, and the CI actions are passing smoothly as well.

If you start digging into other stuff, I recommend pulling in these updates!

On Mon, Apr 24, 2023 at 2:19 PM Jacob Stevens-Haas @.***> wrote:

Agree on getting Nathan/Steve's input. I'm going to give a talk at the Brunton/Kutz group on May 8th around a variety of pysindy changes, including a discussion on what aspects of the API would be breaking changes. In the interim I'll try to PR some of item 3 above (CI and dependencies)

— Reply to this email directly, view it on GitHub https://github.com/florisvb/PyNumDiff/issues/48#issuecomment-1520840594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPCUSBGDX5N46YYSPOTXC3U63ANCNFSM6AAAAAAW5V6CVM . You are receiving this because you were mentioned.Message ID: @.***>

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/

Jacob-Stevens-Haas commented 1 year ago

Awesome, thanks Floris! Next step I'm working on is to refactor the package metadata and requirements into a pyproject.toml. setup.py was deprecated a few years ago. The replacement method also allows you to set version information in release via git tag, so all package uploads to pypi automatically contain correct version number without any manual step, and any in-between versions show git metadata in the version string.