Closed WilliamsTravis closed 2 years ago
@WilliamsTravis, I don't fully follow the "bug" but it sounds like it is really an issue with the WTK meta data which Evan would need to fix and then Ethan Young or Grant would need to update in the .h5 files...
A handful of county names are wrong in the WTK meta, so that's not a reV problem. But, the mismatch between county and states is not present in the WTK. For instance, there is no Hartley, New Mexico in the WTK. Hartley County is in Texas.
I still don't follow how this is a reV bug? Is this an artifact of aggregation?
Yes, sorry, this is an issue in our supply curve tables so it happens in aggregation. Some how counties or states are, in rare cases, being assigned to their neighbor. I.E. there is no Hartley, New Mexico.
Well maybe there should be! Lol so are you suggesting that the state and the county might be getting aggregated differently? That really doesn't appear to be the case so I'm not sure how this would happen. Maybe its an artifact of the techmap NN mapping? but once again i dont see how the state and county would be different there.
The only thing I can think of is that because these SC points are on the edge of the domain the number of points in TX is < in NM. The easy fix would be for country, state, county, and time zone to just assign the value of the WTK point closet to the SC point centroid
@WilliamsTravis I mayyyy have found out why this happens. We take the mode of the states and counties independently, so if there is a tie, you can end up with mismatched state+county combinations. The scipy mode function appears to take the alphabetically sorted first value although I'm not 100% confident in that. In either case, this is now breaking some tests (possibly due to a new scipy version?) so i'll try and fix it so that in the case of a tie you get a sensible state+county combo.
In [1]: from scipy import stats
In [6]: stats.mode(['xhartley', 'hartley'])
Out[6]: ModeResult(mode=array(['hartley'], dtype='<U8'), count=array([1]))
In [7]: stats.mode(['nm', 'tx'])[0]
Out[7]: array(['nm'], dtype='<U2')
Ah, yes that would make sense. Sounds great!
FYI you can see the fix in PR #381 here: https://github.com/NREL/reV/pull/381/commits/f05ea91c9e429b2ef5714f76a4a2aebcabe3c416. Basically the county mode is done first and then the state and other meta data are made to align with the chosen county.
Excellent! This has a been a minor, but pesky, little bug for some time now.
Totally, i cant guarantee that this will fix everything but i saw basically the same bug pop up on the testing end and this should fix it
Gotcha, I'll keep an eye out for it then...if everything isn't fixed, we can always rely on our modal county fips characterization, so it's not urgent.
Closing as per PR #381 and release https://github.com/NREL/reV/releases/tag/v0.6.6
Sorry, thought the bug was back, but wasn't
omg haha noooo. @WilliamsTravis can you provide some resources to debug this? Code/config to reproduce and a charge code?
Charge code... don't miss that...
Bug Description We have a small border issue. In the first image there are 269 points whose county FIPS code could not be inferred by a county-state lookup. Most of these points are right along the border and either the county or state name is wrong.
For the clusters of points, this is because the WTK meta data either used a two word county name when its actually a one word name (i.e. De Kalb vs DeKalb) or vice versa (i.e. Debaca vs De Baca). This can be fixed with a point-in-polygon lookup and isn't a reV issue.
Expected behavior County and state pairs are assigned correctly. I'm not sure if we are assigning them according to the centroid of the full supply-curve cell or that of the un-excluded area, but either way they should match up.
System (please complete the following information):