DOI-USGS / lake-temperature-lstm-static

Predict lake temperatures at depth using static lake attributes
Other
0 stars 3 forks source link

Plot metrics #48

Closed AndyMcAliley closed 2 years ago

AndyMcAliley commented 2 years ago

This PR adds the phase 5_visualize. It allows RMS error and bias to be plotted by lake, by depth, or by day of year.

The plots rely on the interpolated predictions csv. Each row of that csv has both predictions and observations, which is convenient for computing residuals and plotting.

How to run the code

NOTE: The pipeline commands above require a compiled dataset and a trained model each of which takes several hours to obtain, plus a set of observed temperatures and metadata. If you want, copy the following files from Tallgrass, preserving their directory structure relative to the repo root directory:

The prediction/interpolation takes 20-30 minutes. To skip that as well, you should only need the following files:

How to review this PR

Issues that will be addressed in upcoming PRs (so don't worry about them yet)

jdiaz4302 commented 2 years ago

No obvious errors stood out to me. The use and flow of operations looked good and the final plots do not look suspicious.

The bias/rmse for interpolated depth has that obvious spike at around 90 meters that kind of begs context. I assume that may be a single lake that is only responsible for observations that depth? It may be worth seeing something like a corresponding histogram or decile ticks (i.e., 10th, 20th, 30th, etc... percentiles) along the x-axis to get a better idea of data density/sparsity.

Some other potentially interesting dimensions to view error/bias along might include elevation and lake area.