PESTools / pestools

PESTools
12 stars 10 forks source link

additions to Res and Rei files #16

Closed aleaf closed 9 years ago

aleaf commented 9 years ago
aleaf commented 9 years ago

one thing I forgot to mention is dataframe column names. I changed some in the Res method to replace the spaces with underscorses, so they can be called as attributes on the dataframe e.g. df.Absolute_Residual and also allows for tab completion (unless there's a way to do these things with spaces?). But maybe we don't have to follow this for all columns. One potential disadvantage is if the column names are used for plot labels, the underscores look uglier.

echristi commented 9 years ago

rei.phi_by_type didn't work when I tried the notebook. Check that

I was confused what res.phi_m was/is

I'm OK with the changes to Pest base class. We might want to move the method for parsing the rec file at some point if we end up with a Rec class. The method you added for getting regularization weights made me realize the regularization weights for ParSen are getting pulled from the .pst file and really should be pulled from the rec or rei file associated with the jco being used.

I think we should build on the obs_info_file stuff and really think about all the ways that can be used. It would be nice to have one clean file that provides all the support data of this type. If it is too much to add in now we can open and issue and get to it later. Additional items I'm thinking of. 1.) Transient data. Can add info to indicate how observations are part of a single time series so we can plot them? 2.) Alternate or more descriptive names for observations. 3.) Maybe a way to go beyond observation type. Can you think of a way to have sub groups, such that it is extremely flexible.

If we get a nice format we can do a similar thing for parameters. For example, in ParSen we have ways to provide a dictionary for alternate parameter names but we could make it easier on the user with a standard parameter_info_file.

aleaf commented 9 years ago

Ok, phi_by_type should work now. It was a problem with switching obstypes to _obstypes.

res.phi_m is just a view on res.phi that includes only the observation groups (no regularisation). It may or may not be useful as we continue to work on Res. I flagged it with a comment where it is created.

I wrote that Rec method and then realized that the residuals for the regularisation in the rei files already include the reg. weight factor (simply summing them with the observations results in the correct value for total phi). But I left it in as a placeholder in case we want to do anything with rec.

If it would be useful for other classes (such as ParSen) to have information on individual observations for each iteration, it would be pretty straighforward add a method to Rei() to return a set of dataframes (or maybe a panel?) containing all of the information from all rei files.

Totally agree about the obs_info_file and also a corresponding parameters file. The nice thing about these files is that they provide a central repository for all of the information that PEST doesn't care about (such as names, like you mentioned). A datetime column would be pretty useful as well. It would just have to be in a consistent format so that pandas could parse it (the user could choose any format, and supply a corresponding format string). To distinguish between multiple observations we could use locations, or some kind of base name. It could also be useful for steady-state datasets, as it would allow for quick analysis of how the observations are distributed over time.

For observation sub-groups, are you thinking of another level (i.e. subgroups within each type), or an independent way to classify observations (i.e. by type or subgroups). It seems like the latter would be pretty easy, we could just have a general method to slice whatever dataframe by information in a specified column of the observation info file.

echristi commented 9 years ago

Here is what I'm thinking re subgroups.

You have one column called subgroup (or similar). Values of that column for each observation are a string or list that contains all the subgroups that observation can belong to. Something like: 'head region 1, head aquifer A, good data'.

Then using the power of pandas you can query or groupby (I'm not sure which would work) infinite numbers of different subgroups. This would take some fancy pandas work but I'm sure it can be done.

You could also get at the time series this way by having one of the subgroups be the observation you want the time series for. For example in PST you may have the obnames for time series data obA_1, obA_2, obA_3, etc. but you could have one of the subgroups for those observations be 'obA' and use the date time to get the time series.

Sorry for not providing better examples, just thinking conceptually.