NREL / OpenOA

This library provides a framework for assessing wind plant performance using operational assessment (OA) methodologies that consume time series data from wind plants. The goal of the project is to provide an open source implementation of common data structures, analysis methods, and utility functions relevant to wind plant OA.

https://openoa.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

194 stars 63 forks source link

Enhancement/issue 48 #168

Closed RHammond2 closed 2 years ago

RHammond2 commented 3 years ago

Closes #48

This PR updates the QC methods to be bifurcated into a QualityControlDiagnosticSuite and WindToolKitQualityControlDiagnosticSuite to allow for a place to keep general QC methods and a way to extend the base method with case-specific QC methods. This PR also adds a more robust Daylight Savings Time handling through the use of UTC offsets instead of hard-coded date cutoffs.

@ejsimley or @jordanperr the QC examples notebook seems to be breaking with the WTK data, do either of you have any insight on what's going on with this? I've only used WTK a couple of times and am unfamiliar with the specifics.

CLAassistant commented 3 years ago

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: RHammond2
:x: Rob

Rob seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

ejsimley commented 3 years ago

the QC examples notebook seems to be breaking with the WTK data, do either of you have any insight on what's going on with this? I've only used WTK a couple of times and am unfamiliar with the specifics.

@RHammond2 I'm getting a WTK-related error running the example notebook too. I'm almost positive the error is caused by asking for WTK data for coordinates outside of the U.S. The WTK dataset we're using only covers the U.S. (or possibly North America?) but the example wind farm is in Europe. This is why check_tz was set to False in the example notebook previously.

RHammond2 commented 3 years ago

@ejsimley I've updated the WIND Toolkit QC method to have both a check_tz parameter and to fail gracefully by passing a logging message that the coordinates are out of bounds. I've also added the pytz to the requirements per a discussion with Jordan, though pandas already has this as one of their own requirements.

I was unable to ammend the old commits in a way that would realign them with my GH account, so I think we can just ignore the CLA bit for that this time.

RHammond2 commented 3 years ago

@ejsimley and @jordanperr this is now ready for review! I've updated the documentation so that the two examples are labeled as Part A and Part B and fixed all the merge conflicts.

codecov-commenter commented 3 years ago

Codecov Report

Merging #168 (c64348d) into develop (3bdb9d3) will decrease coverage by 3.02%. The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           develop     #168      +/-   ##
===========================================
- Coverage    70.77%   67.74%   -3.03%     
===========================================
  Files           23       23              
  Lines         1591     1662      +71     
===========================================
  Hits          1126     1126              
- Misses         465      536      +71

Impacted Files	Coverage Δ
...ional_analysis/methods/quality_check_automation.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3bdb9d3...c64348d. Read the comment docs.

RHammond2 commented 3 years ago

I've updated the following based on the notebook commentary:

The _df index is now called UTC as it should have been
The datetime column/index comment in 1.2 now makes sense and aligns with reality.
1a.3.3 language has been updated for fall datetime duplications
Removed the confusing "Timezone-adjusted" language to simply be which ever timezone is being represented
Added commentary in 1a.3.4 regarding the duplicates that are still present in the spring data.
dup_time_identification and gap_time_identification now have a which keyword argument to allow customization for the timezone basis.
Updated to correct the wording that there is a time gap in the fall in 1b.3.3.
Found a bug in removing the timezones in the DST plots, so now the figures in notebooks 1A and 1B look highly similar, as expected.

@ejsimley, let me know if there are still any lingering issues, or if any questions still remain in the above comments from your review.

ejsimley commented 3 years ago

Rob, your latest changes definitely fix the confusion about the data frame index and the daylight savings time plots. I have a few last comments on the notebooks, which are listed here.

1a.1.2: "The datetime column/index comment in 1.2 now makes sense and aligns with reality." Agreed, thanks for fixing that. But I think it's worth mentioning in the notebook that the datetime_utc column is calculated assuming that the original datetime is in local time, in case there are any questions about where the UTC value comes from when the datetime is tz-unaware.
1a.1.3.4: "Added commentary in 1a.3.4 regarding the duplicates that are still present in the spring data." I still think it's worth explaining why the spring plots don't show the duplicate data that was identified in 1.3.3. For example, duplicates were dropped before plotting the data.
1b.1.3.3: I'm not quite sure why example 1b (tz-aware example) identifies gaps in the fall, but the tz-unaware example 1a identifies gaps in the spring. Is this the intended behavior, or should this be fixed? If it is intended, then some discussion should be added explaining why the identified gaps are different in the two examples.
1b.1.3.4: Minor point, but if it's not too much effort, it would be helpful to explicitly show the time gaps in the spring for the original timestamps (by plotting NaN values), instead of having the plot interpolate the gaps.

RHammond2 commented 3 years ago

@ejsimley this last commit should take care of the remaining questions for both code changes and commentary. It took a bit longer than I thought it would, but I think it's a lot better for it once again, thanks! If this clears it up for you, feel free to squash and merge, and delete the branch.

ejsimley commented 3 years ago

Hi Rob, I like the scatter plot of the duplicate timestamps! Just a couple last points.

Regarding my comment on "1a.1.2" above, the issue isn’t necessarily that local time could also be UTC (although that is worth pointing out too like you did), it is that someone might not know what time zone the tz-unaware data are in. So, for example, even if the local_tz is set to "America/Denver" for a wind farm in Colorado, the tz-unaware data might be in local time or might be in UTC (or any other time zone). So the datetime_utc column is calculated assuming the data are in local time, but that could be completely wrong. That's the issue I think we should comment on in the notebook.
And then I may have missed it, but I don't think you addressed the comment on "1b.1.3.3" above. Do you mind explaining that?

jordanperr commented 3 years ago

Triggering CI/CD pipeline

RHammond2 commented 2 years ago

@ejsimley, the last commit should address the remaining discrepancies in the commentary. For 1b.1.3.3, there should not be a difference in the plots as of the last update other than a shift in the DST transition due to having the data accurately encoded. Let me know if this helps or not.

ejsimley commented 2 years ago

Hey Rob, the commentary on 1a.1.2 looks good. And sorry to keep bugging you about this, but for 1b.1.3.3, the plots look great. What I don't understand is why qc._time_gaps is showing time gaps in the Spring in example 1a but time gaps in the Fall for example 1b. I think this is somehow caused by how the data frame index is calculated for the tzaware vs. tzunaware cases but haven't fully figured it out. Is this the behavior you expect to see for the two cases? If so, I would suggest commenting on why different gaps are identified in the Fall for the two notebooks (yet the same duplicates are found). Or if it's not what you intended to happen, then maybe there is a bug somewhere?