USGS-R / river-dl

Deep learning model for predicting environmental variables on river systems
Creative Commons Zero v1.0 Universal
21 stars 15 forks source link

Adding GW loss function #105

Closed janetrbarclay closed 3 years ago

janetrbarclay commented 3 years ago

This PR includes the current code for running the groundwater loss function (turned off / on with the loss_type parameter in the config file).

The Snakefile has an additional preprocessing and 2 additional postprocessing rules related to the GW metrics, and there are additional parameters in the config file that are passed to the training rules (which were modified to accept the additional inputs).

Currently the code for pre-/post- processing of temperature signals are in a separate script (gw_utils.py) but these could be moved into the existing pre-/post- processing scripts if that's better. The loss function code itself was added to loss_functions.py.

I'm using two packages that aren't currently part of the rgcn conda environment - seaborn and statsmodels. Neither is essential to the loss function itself, but are used in the preprocessing (statsmodels allows easy calculation of confidence intervals for regression coefficients) and postprocessing (seaborn).

I'd be happy to talk through the additions if that would be helpful.

aappling-usgs commented 3 years ago

Great thoughts, @jzwart . @janetrbarclay , let me know if you'd like any help working through the git conflicts noted above - sometimes they're straightforward, other times not so much.

janetrbarclay commented 3 years ago

Thanks. I should have mentioned this, but I paused on working through the conflicts to incorporate the recent updates to river-dl and to get my presentation ready for this afternoon. It's on my list for after that and I'll reach out if I have questions.

Thanks!

Janet


Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay


From: Alison Appling @.> Sent: Tuesday, June 15, 2021 11:10 AM To: USGS-R/river-dl @.> Cc: Barclay, Janet R @.>; Mention @.> Subject: [EXTERNAL] Re: [USGS-R/river-dl] Adding GW loss function (#105)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Great thoughts, @jzwarthttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjzwart&data=04%7C01%7Cjbarclay%40usgs.gov%7C500be1dac6a749131cf908d9300fbe62%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637593666444769863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6qNak8%2FQTXLmnqYuteKAv7Mmbvr24N3%2F6dg%2FbxTNQMc%3D&reserved=0 . @janetrbarclayhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjanetrbarclay&data=04%7C01%7Cjbarclay%40usgs.gov%7C500be1dac6a749131cf908d9300fbe62%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637593666444779818%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LqOpdHdMp%2Bp0nYO9SgcybqO8MzMgzZiAL2peoqPg6ZE%3D&reserved=0 , let me know if you'd like any help working through the git conflicts noted above - sometimes they're straightforward, other times not so much.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Friver-dl%2Fpull%2F105%23issuecomment-861583105&data=04%7C01%7Cjbarclay%40usgs.gov%7C500be1dac6a749131cf908d9300fbe62%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637593666444779818%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ejosSX4LE3muzDGWFJxYFqiSGKJb5THTAADuNslGN3M%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UC4WCKHWALC4OWL3BLTS5UNXANCNFSM4562N5RQ&data=04%7C01%7Cjbarclay%40usgs.gov%7C500be1dac6a749131cf908d9300fbe62%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637593666444789777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TU6pKsltG9UHEYY%2BEiHxmGqpU8E6RMSKmAfVJxE96Rk%3D&reserved=0.

janetrbarclay commented 3 years ago

@jzwart Could you take a look at this again? I think I've addressed most of your comments. One that I didn't touch was keeping the Snakefile and config files out of the repo. That makes sense. How do I remove those files from this PR?

jzwart commented 3 years ago

@janetrbarclay sorry I'm just getting to this now - I was out on vacation and then buried in project work for a forecasting project. I'll look through all the new changes

jzwart commented 3 years ago

One that I didn't touch was keeping the Snakefile and config files out of the repo. That makes sense. How do I remove those files from this PR?

To be clear, I think there should still be some sort of Snakefile and config file in the river-dl repo, I just don't think it should be project-specific. If someone using river-dl did want to see how to construct a Snakefile with gw loss functions, it might be nice to have an example in river-dl, but I'm not sure it should be the default. So I think there might be a few options:

  1. Remove your gw-specific Snakefile and config file from your local repo, add back the original Snakefile & config file to your local repo, commit & push (that should remove the modified Snakefile and config file from this PR)
  2. Comment out the lines that are specific to gw losses in the Snakefile / config file and try to get as close as possible to the original default Snakefile / config file
  3. Save the gw-specific Snakefile and config file as Snakefile_gw or something similar, and add back in the original Snakefile to your local repo. This way, the original Snakefile would still be in river-dl and people will have an example of what including gw losses would look like.

I think I'd prefer option 3, and then 1, and then 2. But it'd be good to hear what others think

janetrbarclay commented 3 years ago

It would be pretty easy to do option 3 and that makes sense to me. Having alternative snakefiles and config files could be a good strategy in general for demonstrating use of the less routine (or project-specific) features of river-dl. (Could be a way to demonstrate use of the command line training rule as well.)

janetrbarclay commented 3 years ago

@jzwart I've added back the existing Snakefile and config.yml and tagged the gw ones at Snakefile_gw and config_gw.yml. Let me know if there's anything else I missed for this PR.

jzwart commented 3 years ago

Thanks @janetrbarclay , I think it looks good to me!

janetrbarclay commented 3 years ago

@jzwart So is it good to merge? If so, how do we do that? I don't see a link for it so either I'm missing something or I don't have access to do it.

jzwart commented 3 years ago

Yeah I think it's good to merge. @jsadler2 , is @janetrbarclay a collaborator on the repo and has merge privileges? I can merge otherwise

jzwart commented 3 years ago

pinging @jsadler2 on this again. You think it's alright to merge? and also curious why Janet can't merge

janetrbarclay commented 3 years ago

@jsadler2 I think I responded to all your comments. Let me know if there is anything I missed. My response about why the loss function isn't defined in the Snakefile isn't showing up on the Conversation tab, but it does show up on the "Files Changed" tab.

jsadler2 commented 3 years ago

@janetrbarclay - thanks for making those changes. I think it's good to merge, now. I will do that, since I think you can't for some reason.