Cov Cor and Eig classes

echristi commented 9 years ago

Classes for covarience matrix, correlation coefficient matrix, and eigenvectors/values

Only added methods to Cor for now.

See Cor Example.ipynb

echristi commented 9 years ago

I added eigproc method to the Eig class for this pull request. It does the same thing as the eigproc PEST utility. Also added an example notebook on how it can be used interactively.

I'm almost done with a jco class and a identpar class which does everything the IDENTPAR utility does without actually having to run it. Uses numpy svd etc. (not sure if identpar should be a class or method)

jtwhite79 commented 9 years ago

Man, it looks like we are doing a lot of the same things - I've got a repo named pyemu (for python environmental modeling uncertainty). I has some generic matrix and pst classes, but it's more focused on linear uncertainty analyses. Check it out (there are some notebooks) and see if you think we can combine efforts? Hate to have duplicate python packages that do just about the same thing

aleaf commented 9 years ago

Evan, sorry I am just getting to looking at this now. Everything in the Cor Example notebook worked for me, but I had trouble with the eigproc_example. The eigproc_df that was returned was empty. Might be an issue with me trying to fetch from your branch "parameter_cor" and merge it into my local copy of your repository. Maybe it would be best to merge that branch to your master first? (so the pull request would be merging your master with the main repository)

I don't have any other thoughts or comments on the new classes at this point- I'm inclined to just merge for the time being to keep things going.

Which points to Jeremy's comment- any ideas on how we might integrate the two projects going forward? As Jeremy mentioned, it looks like the biggest overlaps are with the pest control file class and the matrices, including operations such as indentifiability. It seems like we could leverage Jeremy's matrix classes and probably some of the pst control file methods.

Not sure about the best way to combine- we could maintain two separate repositories and then have dependencies (i.e., if we wanted to use the classes in mat_handler.py, we would import them into pestools from pyemu). This may become a pain though, if we want to modify any code from pyemu, and it would also require users and any future collaborators to work with two packages.

We could also combine the packages into one, but the potential downside there is if our goals interfere. One of the initial goals for me with pestools was the modest objective of streamlining/formalizing visualization and data munging routines, to make things simple and reproducible, and remove the headache of dealing with a bunch of ad hoc scripts. But maybe Jeremy doesn't want this stuff cluttering up pyemu. It wouldn't be the worst thing, though, to have a python toolkit for PEST where the user could do everything from basic visualization and processing to linear prediction uncertainty.

Thoughts? Jeremy, do you have any ideas on how we might combine?

jtwhite79 commented 9 years ago

I'm completely open to collaboration. To me, I think it would be great to have one integrated package for dealing with all things pest. Our combined efforts would be greater than the sum of our individual efforts for sure.

But, if we are going to combine efforts, then we all probably have a little work to do. I'll take some time to look at the pest_tools matrix and pst classes, can you guys do the same with pyemu classes? There are a few notebooks in there that demo some of the functionality. Would you guys be open to using the pyemu pst and matrix (and derived covariance) classes? I'm would prefer to keep the pyemu matrix and pst classes because I've verified most of the code against other legacy codes and I'm close to releasing pyemu as a usgs thing (I'm contractually bound, but it will remain a github open source project). I'm planning to have a pub (like a groundwater note) to go along with it - I'm more than happy to have you guys as coauthors if we can create an integrated package. Are you guys planning to have pest_tools be a public repo?

Let me know how you guys want to proceed - I'm hopeful we can work together to make a really cool package.

On Wed, Jan 14, 2015 at 4:34 PM, aleaf notifications@github.com wrote:

Evan, sorry I am just getting to looking at this now. Everything in the Cor Example notebook worked for me, but I had trouble with the eigproc_example. The eigproc_df that was returned was empty. Might be an issue with me trying to fetch from your branch "parameter_cor" and merge it into my local copy of your repository. Maybe it would be best to merge that branch to your master first? (so the pull request would be merging your master with the main repository)

I don't have any other thoughts or comments on the new classes at this point- I'm inclined to just merge for the time being to keep things going.

Which points to Jeremy's comment- any ideas on how we might integrate the two projects going forward? As Jeremy mentioned, it looks like the biggest overlaps are with the pest control file class and the matrices, including operations such as indentifiability. It seems like we could leverage Jeremy's matrix classes and probably some of the pst control file methods.

Not sure about the best way to combine- we could maintain two separate repositories and then have dependencies (i.e., if we wanted to use the classes in mat_handler.py, we would import them into pestools from pyemu). This may become a pain though, if we want to modify any code from pyemu, and it would also require users and any future collaborators to work with two packages.

We could also combine the packages into one, but the potential downside there is if our goals interfere. One of the initial goals for me with pestools was the modest objective of streamlining/formalizing visualization and data munging routines, to make things simple and reproducible, and remove the headache of dealing with a bunch of ad hoc scripts. But maybe Jeremy doesn't want this stuff cluttering up pyemu. It wouldn't be the worst thing, though, to have a python toolkit for PEST where the user could do everything from basic visualization and processing to linear prediction uncertainty.

Thoughts? Jeremy, do you have any ideas on how we might combine?

— Reply to this email directly or view it on GitHub https://github.com/BarrEng/pest_tools/pull/22#issuecomment-70006237.

echristi commented 9 years ago

I haven't had a chance to look at Jeremy's stuff much but hope to soon. However, here are my thoughts and some answers to questions.

I think pyemu and pestools have different goals but are so related it probably doesn't make sense to be separate. I added an abstract we submitted for MODFLOW and More to the readme of this repo. It gives an idea of what we are trying to do. This will be a public repo at some point prior to MODFLOW and More.

I didn't have any intention of implementing any of the more sophisticated uncertainty stuff Jeremy is working on mostly because of my limited coding skills and not having a full grasp of the math. My intent was getting common PEST related info into python containers to help with parsing and plotting. With Jeremy's help and expertise I don't see any reason to not take it further.

I think there are several paths. Having not looked at pyemu in detail yet I'm not sure what is the best way forward. I'm thinking it might be best to work to have a common foundation and combine after Jeremy's release of pyemu, after MODFLOW and More, or some other time in the near future.

mnfienen commented 9 years ago

Hey Y'all

I haven't had a chance to dig into either codebase as much as I would like. But....based on the conversation here, I suggest that Jeremy goes ahead with publication of pyemu as standalone and focused on uncertainty analysis. Meanwhile, Evan and Andy can go ahead toward the MODFLOW and More presentation. Following that effort, though, it really does seem to make sense to merge efforts -- I think it will benefit all of us and the community in many ways. It sounds like the matrix objects Jeremy is using could be folded into pest_tools. One thing I've not played with is dependencies among Git repos. it might be worth collaborating on the lowest level core stuff (classes for matrices, PST files, and a few others) and having both codebases depend on it. We could all work on the common base (call is pypest_core or something) and commit to keeping it clean and robust. Then, pest_tools and pyemu (and others -- keyPEST maybe?) could grow from there.

That's my thoughts anyway....

jtwhite79 commented 9 years ago

Mike makes some good points (as always). The only reason I would hate to wait any longer to combine efforts is that it will make the integration that much more difficult if pest_tools continues developing at a rapid pace. I've been looking around in pest_tools and it seems like combining efforts at this point would be pretty easy. But I can understand if you guys want to hold until after MF&M.

On Thu, Jan 15, 2015 at 8:01 AM, mnfienen notifications@github.com wrote:

Hey Y'all

I haven't had a chance to dig into either codebase as much as I would like. But....based on the conversation here, I suggest that Jeremy goes ahead with publication of pyemu as standalone and focused on uncertainty analysis. Meanwhile, Evan and Andy can go ahead toward the MODFLOW and More presentation. Following that effort, though, it really does seem to make sense to merge efforts -- I think it will benefit all of us and the community in many ways. It sounds like the matrix objects Jeremy is using could be folded into pest_tools. One thing I've not played with is dependencies among Git repos. it might be worth collaborating on the lowest level core stuff (classes for matrices, PST files, and a few others) and having both codebases depend on it. We could all work on the common base (call is pypest_core or something) and commit to keeping it clean and robust. Then, pest_tools and pyemu (and others -- keyPEST maybe?) could grow from there.

That's my thoughts anyway....

— Reply to this email directly or view it on GitHub https://github.com/BarrEng/pest_tools/pull/22#issuecomment-70089041.

mnfienen commented 9 years ago

DOH! I didn't really mean to hold off on integration -- just on public integration. I still think publishing pyemu as is and doing the MF+More paper for pest_tools makes sense, and after that can make it all more public. But....I think the consolidation can happen sooner.

Sorry - I wasn't as clear as I meant to be --- I was still on my first cup of coffee :)

aleaf commented 9 years ago

I totally agree. Jeremy, how solidified are your pst_handler and mat_handler modules? Maybe for the time being we can start importing those (from a fork of pyemu) and inheriting from them?

jtwhite79 commented 9 years ago

I'm solid the methods and attributes - I don't foresee anything on my end that will cause a major shift. II'm hopeful integration will be minimally invasive, but I'm also willing to accommodate whatever you guys need to make it work on your end. 've got a couple of demo notebooks for those classes - check them out and let me know what you think.

On Thu, Jan 15, 2015 at 4:10 PM, aleaf notifications@github.com wrote:

I totally agree. Jeremy, how solidified are your pst_handler and mat_handler modules? Maybe for the time being we can start importing those (from a fork of pyemu) and inheriting from them?

— Reply to this email directly or view it on GitHub https://github.com/BarrEng/pest_tools/pull/22#issuecomment-70172140.

echristi commented 9 years ago

I looked at Jeremy's stuff tonight. I think it will work fine. I think if would be best for Andy and I to start off implementing most of the pst stuff but just small parts of the matrix class to get comfortable with what is all there and pull in as we go. For example, hold off on all the overloading stuff for awhile. The only thing that jumps to mind changing right now is the to_dataframe method should be decorated or a second identical decorated method called "df". All our plotting stuff uses dataframes (typically an attribute called "df" so they need to be an attribute of the matrix class. I'm sure there are other similar adjustments but I'm can't think of them right now.

echristi commented 9 years ago

Looking at Jeremy's stuff again I was wrong about holding off on the overloading stuff. I was thinking that was in just to speed things up a little but it's a core component.

Jeremy - I'm curious. Did you compare how you implemented everything to doing all the operations as DataFrames with row_names as the DataFrame index? I'm wondering if there is an advantage or speed gains.

jtwhite79 commented 9 years ago

I did benchmark the linear algebra using just dataframes and it was significantly slower since most covariance matrices are diagonal and there is a lot of overhead associated with each dataframe. I use some trickery with diagonal matrices to get monster speeds compared to storing and operating full dataframes.

On Fri, Jan 16, 2015 at 9:25 AM, echristi notifications@github.com wrote:

Looking at Jeremy's stuff again I was wrong about holding off on the overloading stuff. I was thinking that was in just to speed things up a little but it's a core component.

Jeremy - I'm curious. Did you compare how you implemented everything to doing all the operations as DataFrames with row_names as the DataFrame index? I'm wondering if there is an advantage or speed gains.

— Reply to this email directly or view it on GitHub https://github.com/BarrEng/pest_tools/pull/22#issuecomment-70268974.

PESTools / pestools

Cov Cor and Eig classes #22