Open Graham-EGI opened 12 months ago
Thanks, @Graham-EGI for bringing this up. On my first look at the issue, my guess is similar to yours in that the binning of the Te is causing the behavior you are seeing.
@ryancoe can you comment on if this is a known behavior for the modified PCA approach we take here?
Referring back to Aubrey's paper on the PCA contours, I suspect it may have to do with the binning that's done to fit in second principal component/dimension (see Section 3.2), which looks like it is set by an optional argument. Try playing around with this.
Eckert-Gallup, Aubrey C., et al. "Application of principal component analysis (PCA) and improved joint probability distributions to the inverse first-order reliability method (I-FORM) for predicting extreme sea states." Ocean Engineering 112 (2016): 307-319.
Wanted to take a quick look if this was something that could likely be tuned on my end, when calling contours.environmental_contours()
, here are the results:
Nice Graham. The y-shape is improved for larger bin sizes. My guess would be that the next issue may be that the PCA assumes equal bin size. Perhaps you could try setting independent bin sizes here: https://github.com/MHKiT-Software/MHKiT-Python/blob/e5916004ff0413f32f9559790582c95193acdd24/mhkit/wave/contours.py#L461C5-L461C27
Well, looking at it more we didn't see any changes in the x-axis for varying bin sizes from your analysis. But potentially below the linked line by adjusting the fit you could get better results. Very interested in any findings you have.
Yeah... I'm not honestly sure if the bin size is really the fundamental problem here... I think we need to look a bit more closely before go down a rabbit hole on this. I know they're quite busy with other things these days, but perhaps @aubreyeckert or even @nevmart have some idea as to what's going on here?
Got it, thanks guys!
Hey @ryancoe and all. Just chiming in that we're hitting this bug as well. In the last week we've been looking into using this tool in one of our projects. It would be awesome to get to the bottom of it.
@jtgrasb in #261 detailed the same issue.
In my opinion this seems to me be more of a short coming of the provided methods than a bug but I do not claim to know the details of how these methods should work.
Is this a bug or a short coming of the provided methods?
@mattEhall @jtgrasb @mbruggs (in place of Graham) @ryancoe
I have been poking at this issue today and yesterday. I have not identified exactly where the issue is occurring in the method but wanted to check the water on the level of interest for this issue.
One of the first things I checked was the histogram of the data. As shown in the first 2 plots below it is clear that Hm0 and Te are not perfectly Gaussian. Each show skewness (they do fail the shapiro_wilk_test included in the code below). However the contour method works with these datasets.
Looking at the the third figure for Tp Q-Q plot is does not seem particularly worse and shows less skewness but the histogram has far more than a single peak. As this is the data causing the issue my current thought is that the multiple peaks in this non gaussian distribution lead to the poor contours observed in the original issue.
The reason this is important is because the PCA method assumes that the data, when transformed into the PCA space, can be modeled with a Gaussian distribution. This is evident in the use of stats.invgauss.ppf for the inverse cumulative distribution function (CDF) to calculate the component 1 values. If Tp does not follow a Gaussian distribution, the results will not reflect the actual distribution of the data. There is then a linear relationship assumed between the two components that converts the second component into the PCA space. If the relationship between Tp and the first component is not linear, this step will not capture the true dependency, leading to incorrect contouring.
As stated above I have started following the changes through the method when using Te vs Tp but I have not identified exactly where things break down yet.
I am happy to keep poking at this if there is still interest. Let me know if you guys have thoughts on this initial look at the data or just that you think it is a good expenditure of time to identify exactly where this binned Tp data causes issues vs simple using Te.
Hi @ssolson, thanks for your message and the informative observations. This isn't a topic I know a lot about or have the chance to get into in more detail, so I don't follow everything (but I see what you mean about the skewness and that it might not be fitting the assumptions of the method well). For what it's worth, I am still very interested in using this functionality. We haven't found another way to create these contours.
Hi @ssolson, thanks for the update. I am also still interested in this functionality. It's not a pressing issue for the project I was using it for as we were able to use the KDE method with a different dataset and get okay results, but it would be nice to have a workflow for such data in the future.
Describe the bug:
PCA environmental contours returning odd results when Tp is used as the period value (rather than Te)
To Reproduce:
This is just a slight modification to the code in the example notebook in
examples.environmental_contours_example.ipynb
to calculate Tp from the spectral data returned from NDBCswden
parameter.Not all buoys have this same behavior. Example is of one that creates it.
Expected behavior:
PCA contours having similar shapes between Tp and Te as the period value.
Screenshots:
Bad contour with Tp as period value:
Good contour with Te as period value:
Desktop (please complete the following information):
Additional context:
There are quite a few of these NDBC swden data sets that produce this contour shape when using Tp calculated from MHKiT. Appears to be related to the banding, and the density of the data:
Created with (where
Tp
andHm0
variables are from the example above) :