DOI-USGS / lake-temperature-lstm-static

Predict lake temperatures at depth using static lake attributes
Other
0 stars 3 forks source link

Plot obs counts #53

Closed AndyMcAliley closed 2 years ago

AndyMcAliley commented 2 years ago

Per Jeremy's suggestion in #48, histograms showing observation data density provide important context when evaluating model performance. This PR adds those histograms to the pipeline. Histograms of number of observations can be plotted for the variables elevation, area, day of year, and depth. Additionally, a hexbin plot with day of year on the horizontal axis and depth on the vertical axis can be plotted.

How to run the code

This code has been run locally with success.

Results

Number of observations by depth in validation set

obs-count-by-depth-over-valid

Number of observations by day of year in validation set

obs-count-by-doy-over-valid

Number of observations by elevation in validation set

obs-count-by-elevation-over-valid

Number of observations by area in validation set

obs-count-by-area-over-valid

Number of observations by day of year and depth in validation set

obs-count-by-doy_depth-over-valid

How to review this PR

Issues that will be addressed in upcoming PRs (so don't worry about them yet)

Closes #49

jdiaz4302 commented 2 years ago

Potentially fun fact question regarding the hexbin - is that narrow, dimmer stripe representing a less densely observed day in the middle of summer July 4? 😂 That's the only thing that makes sense to me (lack of any/most human-gathered samples) and it looks to fall in the right area (day 185)

AndyMcAliley commented 2 years ago

One trivial thing that could save on documented function lines would be to move the following lines into the plotting function, having it automatically save the plot it generates.

That would definitely condense the code! The reason I separated the plotting and the saving into two functions was to the plotting function in other contexts, such as making a multi-plot figure with both a histogram and a plot of RMSE.

Now I'm realizing that it would actually be better to pass in a matplotlib axis to the function rather than make a new figure inside the function. So, I changed the code to allow for that kind of function reuse.

Potentially fun fact question regarding the hexbin - is that narrow, dimmer stripe representing a less densely observed day in the middle of summer July 4? 😂 That's the only thing that makes sense to me (lack of any/most human-gathered samples) and it looks to fall in the right area (day 185)

That must be it!

day    counts
182    2706
183    2128
184    1684
185     673
186    1578
187    3102
188    2856