Closed espg closed 3 weeks ago
apparently happening in label/cluster 74 (of 90):
for label in np.unique(c):
print(label)
#nodes.append(nx.Graph())
# Select Cluster
points_ = np.where(b[label])[0]
# Flag centroid point
remove = np.where(points_ == a[label])[0]
points_ = points_.tolist()
# remove centroid point so it's not repeated
points_.pop(remove[0])
# add central point to beginning so it's the central connection point
points_.insert(0, a[label])
#nx.add_star(nodes[label], points)
73
74
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[68], line 10
8 points_ = points_.tolist()
9 # remove centroid point so it's not repeated
---> 10 points_.pop(remove[0])
11 # add central point to beginning so it's the central connection point
12 points_.insert(0, a[label])
IndexError: index 0 is out of bounds for axis 0 with size 0
very strange behavior, likely occurring from very strange geometry... here's where the error happens for cluster 74:
# retrieve point indices for cluster 74
points_ = np.where(b[74])[0]
points_
array([ 5, 7, 147, 155, 206, 221, 229, 301, 469, 542, 585, 586, 718])
# determine which of the above point indices is the central point
# (so it can be moved to the beginning of a list and used as the center of the plot)
remove = np.where(points_ == a[74])[0]
remove
array([], dtype=int64)
The above is not supposed to be empty, which is why we get the pop error from an empty list. We can see the point we're looking for:
a[74]
721
... and 721 isn't in the array with values array([ 5, 7, 147, 155, 206, 221, 229, 301, 469, 542, 585, 586, 718])
Here's the plot of what's going on with these stations:
array([ 5, 7, 147, 155, 206, 221, 229, 301, 469, 542, 585, 586, 718])
BisectingQMeans
. Our "Qmeans" routine inherits from the "Kmeans" sklearn base-class, and centroid calculation is handled within the base-class.select_central_point
routine inside of pgamit. The reason that we need this routine is because the red square above isn't actually a station, it's the centroid of the cluster-- we use select_central_point
to snap that centroid to the closest station so that we reference a real GPS station as the central point in the plots, and not just something floating in space.select_central_point
then?No, not really. The code for that function is quite terse, so there isn't much room for things to go wrong:
The red square doesn't look like a centroid to me...
Me neither. Keep in mind that the purple squares show the cluster after it's been expanded, so the mass of the centroid from Kmeans/Qmeans was initially likely to be much further south. (The station in India is from expansion off of the station north of Madagascar... similar for the stations on the Arabian peninsula and in southern Europe-- all those stations are from cluster expansion which occurs after centroid assignment in Kmeans/Qmeans). There's also multiple stations in South Africa that are close together, which was pulling the weighting down towards them when centroid assignment did happen.
It's the closest station using euclidean distance and ECEF coordinates. One of the purple stations to south might be closer if we use lat/lon coordinates and calculate great circle distance (which we can do). No guarantee that it won't happen again though.
We have no control over the centroid selection from kmeans, and overwriting it is almost certainly more work than it's worth.
The lazy option is to just wrap this in a try except block, and then pick a random station within the cluster membership to act as central point. The central point is only used in the plots, not the numerics, and this is an edge case-- we don't expect it to happen except rarely, and it will fix the error and let us get on with processing.
A more (human) time intensive fix would be to do the same try/except logic, but recalculate the central point using the cluster coordinates in lat/lon space whenever we hit the exception.
The centroid of the cluster has one important use for adding stations to the processing after the network has been created. Sometimes stations become available after we've done the processing. To not reprocess the entire day, we add the station to the cluster with the closest centroid to the station we want to add and reprocess just that subnet. Thus, maybe it would be better if you recompute the centroid after expanding the subnet so that better reflects the geometry.
Pushed updates to the PR will overwrite the central_points
with an appropriate centroid that corresponds to a central station within the cluster if over_cluster
selects a 'blue' rather than 'purple' station above.
There's likely some redundant code in add_missing_station
that can get pruned at some point, but #133 will fix our error and work with the current network
module code.
closing this for now-- can reopen if further testing shows additional errors inside of plot_global_network
Describe the bug
Posting here to document the minimal reproducible example.
Data file:
data-1729709491370_4_shane.csv
Steps/Code to Reproduce
Expected Results
Plots, with no errors...
Actual Results
Versions