gmdsi / GMDSI_notebooks

python-based predictive groundwater modeling workflow examples
GNU General Public License v3.0
58 stars 35 forks source link

structure and planning #9

Closed rhugman closed 2 years ago

rhugman commented 2 years ago

Thought it would be useful to have a common overview. This has grown somewhat beyond what I originally intended. Please feel free to suggest any changes. Nothing is set in stone and I am happy to take direction.

I have outlined the main parts of what I propose here: https://github.com/rhugman/GMDSI_notebooks/blob/main/tutorials/notebook_order.md

I have broken the "course" into 2 parts. The first follows the outline of MF's GW1876 repo. The second is closer to JW's "decision support notebooks" repo. The first part aims to address more of theory and concepts, and gently introducing the use of pyemu. The second part is intended as full-on pyemu/scripted workflows and more of an example of "how to". Part2 may revisit some of the concepts discussed in Part1, but only superficially.

jtwhite79 commented 2 years ago

@rhugman I like the concept. a few questions that might help set the path for us (I dont know the answers to these...):

rhugman commented 2 years ago

Thanks @jtwhite79 ; see replies inline.

  • So are you thinking the the part 1 notebooks will be stand alone (in that you don't have to pre-run any other notebooks first)?

My current approach is to have backups of each tutorial folder. If a subsequent tutorial requires folders from a previous one, the notebook starts by running a function to pull them in. The cost: (1) the repo may get large and (2) requires that we run a script that executes all the notebooks and prepares the backup folders and can take some time (this is what herebedragons.py does).

  • Related: Will all of the notebooks rely on PstFrom style interface (part 1 is most what Im asking about)? Im wondering if the part 1 notebooks should do something more intuitive and entry-level that PstFrom just to not distract at the beginning? This is how the org class that @mnfienen, Randy and I taught. But it required multiple set of model+pest files (one for zones, one for zones+recharge, one for pilot points, etc) so it became hard to update and maintain. Im not sure what the best balance here is...

No. Only the notebooks in Part2 use the PstFrom style interface. Part1 uses a entry-level "manual" PEST setup. I followed @mfienen class setup as much as possible. Same philosophy as above, herebedragons.py prepares backup folders and runs everything in sequence. So far I have only setup the "single parameter" version.

Whilst working through it, I did think that using MF6 Freyberg for both Part 1 and 2 might not be ideal. Part 1 may benefit from an even simpler model (Moore&Doherty 2005 particle tracking?). But I didn't want to reinvent the wheel.

  • Do we still want to show people a single hk parameter model? or a zones-only model? More recent class offering skipped those instructive (but somewhat antiquated) parameterization schemes just to make more time (and focus more) on the higher dimensionalty parameterization (>=pilot points) stuff...maybe since this is self-guided, these are fine to include, but (related to the first point above) these will require either static model+pest files for each case or something clever behind the scenes to build the interface on-the-fly...

Good point. I do think it is worth at least going through the steps with at least a "single" parameter model. (How often do you find people who still don't get it??) However, we could skip the "zones" portion and go straight to ppoints.

  • Are you thinking that all the notebooks will use the modflow6 freyberg model as "the model"? If so, we will need to update all of the part 1 and part 2 notebooks (no biggie - just wanted to confirm)

Everything is being been updated to MF6 Freyberg. I am using the freyberg_mf6 model that is in the pyemu repo. So, 3 layers, 25 spd's, etc. Differs from the original paper, but if I understand correctly is the same version described in the pest++ documentation?

jtwhite79 commented 2 years ago

All good @rhugman . The thing that burned us a few times was that we also had a master script to run all notebooks and save results, etc, but if we had some specific language describing a result in a notebook like "see how XXX has YYY" and then something changed, the notebook might still run, but "see how XXX has YYY" might not jive with the results being shown -facepalm! But I think the way you are doing it is the most conducive to learning...

jtwhite79 commented 2 years ago

@rhugman just looking at what we have now and thinking more about this...I wonder if we should have the setup pstfrom notebook just do enough to get us ready for prior monte carlo. Then we can a sep notebook that sets obs vals and (initial) weights. Currently we are setting obsvals and weights in the pstfrom notebook, which a) makes it even longer and b) kinda buries a very important step in the workflow. What do you think about this?

Also, on the concept of having "existing" or "backup" results being stored, I think we want to probably not have those results stored in the same place that the users results will go (this also makes it hard for us during dev because every time we run a notebook, those existing dirs get overwritten which git sees as new files). Maybe we can have a script that will programmatically run all the notebooks and make a backup dir somewhere for safe keeping (or atleast save down a pdf of the executed notebook). Then only need run this script every once and a while. Thoughts on this?

rhugman commented 2 years ago

@rhugman just looking at what we have now and thinking more about this...I wonder if we should have the setup pstfrom notebook just do enough to get us ready for prior monte carlo. Then we can a sep notebook that sets obs vals and (initial) weights. Currently we are setting obsvals and weights in the pstfrom notebook, which a) makes it even longer and b) kinda buries a very important step in the workflow. What do you think about this?

OK. So, I will tidy upi the pstfrom notebook with your updates for the "add obsvals later".

Then, I'll rework the obs&weights ntoebook to deal with adding obsvals, as well as the obs cov mat section (currently at the end of pstfrom notebook).

Also, on the concept of having "existing" or "backup" results being stored, I think we want to probably not have those results stored in the same place that the users results will go (this also makes it hard for us during dev because every time we run a notebook, those existing dirs get overwritten which git sees as new files). Maybe we can have a script that will programmatically run all the notebooks and make a backup dir somewhere for safe keeping (or atleast save down a pdf of the executed notebook). Then only need run this script every once and a while. Thoughts on this? >

Kind of where I was going with HBD (just not finished yet). Notebooks which are required for subsequent tutorials are run, then backup folder is copied to the '..\models\' folder. Atm only pstfrom and obs&weights notebook accounted for. Only intend to do this for notebooks which create starting points for others.

Or do you mean something else?

Also intend to store all notebooks as html's to make then easy to pop onto the website (on the todo list).

jtwhite79 commented 2 years ago

That is what I meant re backups and sounds good. I just noticed that some of notebook folders have working files committee to the repo so I thought maybe you were using those as the backups. I've got a script that shows how to execute the notebooks if you wanna see a convulted way to do it...

rhugman commented 2 years ago

Those commits were likely caused by me running a notebook explicitly.

OK, sure. I am using the run_notebook() function in HBD.

jtwhite79 commented 2 years ago

Kewl. I also added a script to clear the notebooks - running it before committing can save a lot of git headaches!

rhugman commented 2 years ago

Aha! OK, that explains that. I am a bit of a git n00b (too long working solo...) - any tips&tricks are welcome!

rhugman commented 2 years ago

Regarding "obs&weights" notebook. I went back through your original repo - did you use a model with more shorter SPDs to generate the "measured" data? And if so, do you thin kit is worth doing here? I assume the value is to get readers to think about the meaning of converting field data to model values.

jtwhite79 commented 2 years ago

There exists a "high res" freyberg model that has more rows/cols and is daily. I think we used a prior mc on that model to get us a truth and I think I also corrupted those truth obs with "human error" just to make the problem a) harder and b) more realistic. But we dont have to go that far with these notebooks...or we could have "choose your own adventure"? haha

rhugman commented 2 years ago

OK; jus tso you are aware - I have added in some fake high res data for demo purposes. Gets created when making the truth model. Currently mid-way tailoring the obs&weights notebook accordingly.

jtwhite79 commented 2 years ago

looks like you might need to git add those fake high res data. Do you want to stay with this high-res truth? It does provide some nice (but pretty simple) obs data processing to mimic what (kinda) happens in real cases. If so, should we add the truth generation process to the repo somewhere so we can repro the truth gen?

rhugman commented 2 years ago

commited

Yeah might as well. The truth generation process is now included in make_truth() in HBD. I just added gauss noise to each measured time series and then stored in tidy format in obs_data.csv.

jtwhite79 commented 2 years ago

Ok I was thinking about adding the high res model+pest files for the truth generator. Is that ok? Or do you wanna stay with what's in hbd? I'm fine to stay with it as is if that's your pref...

rhugman commented 2 years ago

Ah!. Sure man, whatever you prefer - I am easy.

jtwhite79 commented 2 years ago

Sweet. I'll get on that tomorrow.

jtwhite79 commented 2 years ago

Sorry for the delay on this @rhugman - I'm planning to get the truth stuff setup asap.