Open espg opened 1 month ago
Added in the new BisectingQMeans
method; also did some additional module clean up, although some deeper cleaning is needed for flake8 compliance... I'm not sure how the database literals can be split up without breaking things, so leaving them as is for now.
Overview
Replaces the clustering method for splitting up GAMIT runs when the station count is too high for single pass processing. This PR removes about 500 lines of code, and simplifies the module wherever possible.
Installation
Note that for modularity, the overcluster functions are in a different repro, and are imported-- they can be installed from pypi using
pip install overcluster
.Details
This PR leaves the
if
conditional from lines 83 to 137 unchanged, including all functions that touch that block-- i.e.,recover_subnets()
,add_missing_station()
, andbackbone_delauney()
. Everything from theelse
conditional on lines 137-175 is heavily rewritten. This includes completely removingsubnets_delauney()
andtie_subnetworks()
; the functionmake_clusters()
is kept as a function name, but shares no code with the prior version of the function. Bothglobal_sel()
andsave_clusters()
are no longer called after removing the prior functions, and have been deleted.This should match the prior data structure API of data variables that get passed to the GAMIT.session; the
clusters
variable is still setup as lists of arrays that can be indexed by cluster number as in the final lines for thecreate_gamit_sessions()
loop. Theties
data variable is also maintained... for now (see below).Testing
I haven't run this in the gamit environment, but am fairly confident that it will run without error for larger station runs. However, I'm less clear on the 'regional' cases involving low enough stations that we have a single cluster / no parallelization. I think that things will still work with
make_clusters()
generating an empty list forties
and single member variables forstations
andlabels
list membership... however, I wouldn't be surprised if this needs some debugging, since I haven't tested that case locally.Additional Things that could/should be addressed
clusters['labels']
variable; it doesn't seem to hold any information and should probably be cut.ties
arrays are pretty long-- around 50% of theclusters['station']
arrays on average. I don't know if we want to keep them, especially if they aren't used for processing.ties
variable is being passed... I'm assuming it's station id's??