USGS-R / regional-hydrologic-forcings-ml

Repo for machine learning models for regional prediction of hydrologic forcing functions. Includes probabilistic seasonal high flow regions for CONUS, and prediction of high flow metrics for selected regions.
Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

feature attributes for conus #171

Closed cstillwellusgs closed 1 year ago

cstillwellusgs commented 1 year ago

closes #142 closes #153

cstillwellusgs commented 1 year ago

@jds485 p1_feature_vars_g2 and p1_feature_vars_conus do not have an identical number of columns. p1_feature_vars_g2 contains some random columns from gages2.1 (npdes, strg, etc.) and also the weighted averages for time-varying attributes (land cover, weather, dams). Other than those differences, the remaining columns all match (but they are not in the same order). Do any of your downstream functions rely on specific positioning and/or dimensions of the p1_feature_vars_g2 data frame (and thus the specific positioning and/or dimensions of the p1_feature_vars_conus data frame)? If I remember correctly, the function that down-selects correlated variables relies on the order.

Down the road I intend to clean up the prep_feature_vars_g2() function and at that point I can change the order and inconsistency of variables across p1_feature_vars_g2 and p1_feature_vars_conus but for the sake of time I'm assuming the issue I described above is okay for now. Let me know if I'm assuming incorrectly.

cstillwellusgs commented 1 year ago

it also appears that I have a merge conflict but the "Resolve Conflict" button is greyed out... maybe we can scan that together in the morning if you don't mind.

jds485 commented 1 year ago

Do any of your downstream functions rely on specific positioning and/or dimensions of the p1_feature_vars_g2 data frame?

Yes for p1_feature_vars_g2, but no for p1_feature_vars_conus because we can select the columns to be the same as those selected after processing p1_feature_vars_g2. So I don't see column order as a problem

cstillwellusgs commented 1 year ago

All comments addressed. Did you see Andrew's response about the nhdPlusTools url? Other than that, it should be good to go.

jds485 commented 1 year ago

The edits look good to me. I'll wait to approve until after our discussion tomorrow.

jds485 commented 1 year ago

I decided to try making the CONUS predictions tonight because I thought it might take a while to complete. Something to note is that the column names for the weighted and long term average calculations are different for the NHD reaches (longterm_avg) and the g2 data (weighted_avg). I was able to rename for use in the prediction function, but it would help to keep the names the same from the start. That doesn't have to be addressed in this PR. It can be handled with the renaming features issue