USGS-R / regional-hydrologic-forcings-ml

Repo for machine learning models for regional prediction of hydrologic forcing functions. Includes probabilistic seasonal high flow regions for CONUS, and prediction of high flow metrics for selected regions.
Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

Extend EDA for attributes to CONUS reaches #151

Closed jds485 closed 1 year ago

jds485 commented 1 year ago

Depends on #142.

Create target(s) that create maps and violin plots of the reach attributes across CONUS.

slevin75 commented 1 year ago

@jds485 This isn't ready for a pr yet but I have a couple questions. So far, I have just done comparisons of the gagesii features and the NHD conus features. I haven't looked into the area of applicability yet, I thought it would be good to just look at the raw CONUS data for a sense of how different the distributions are. I can look into the aoa and try to add those as a 3rd violin plot on these. These are for the high aggregated quantiles. I'll add another target to do the mid quantiles.

Right now I just sort of eyeballed the plots and made a list of ones that should be log transformed - things like drainage area. A lot of the plots need to be transformed but have zeros in them. So - I could either just add a constant and transform just for visualization purposes. Or the scales packages has a 'pseudo log transform' which I have never used but might be ok for our purposes?

Here are some examples: Definitely a difference in drainage area distributions ACC_BASIN_AREA

Should be transformed but has zeros TOT_OLSON_PERM

TOT_ELEV_MAX

slevin75 commented 1 year ago

Also, the function makes these in a loop and its pretty slow. It might go faster if it were parallelized but I'm not sure I know how to do that. I'll try and look at some of the other functions that are parallelized and see if I can figure it out.

jds485 commented 1 year ago

I can look into the aoa and try to add those as a 3rd violin plot on these

Just so I'm understanding what would be in that 3rd violin plot: are you thinking that you would apply the aoa method, select the regions within the aoa, and then the 3rd violin plot would be for only those retained reaches? Also, the aoa method would be applied to only the attributes that are used in the models, so some of these attributes would not have a 3rd violin.

I'll add another target to do the mid quantiles.

Sounds good. Those targets can be added after the high quantile results are working.

So - I could either just add a constant and transform just for visualization purposes.

That's fine by me. The idea of pseudo log transform makes sense, though. I've not applied, but for visualization purposes it might be better than adding an arbitrary constant.

It might go faster if it were parallelized

We can discuss in the meeting later if you can't figure it out.

slevin75 commented 1 year ago

@jds485 ok, I have updated these so that they are log transformed and only selected the ~46 features from the model. I forget what we decided to do about the aoa. Should I look into that? Were you going to look into it? Should I just pr this as is and we can add the aoa later if we want to?

jds485 commented 1 year ago

Thanks! PR this as-is and we can add aoa as a different PR. I think that method will help for our flow metrics paper as well