demiangomez / Parallel.GAMIT

Python wrapper to parallelize GAMIT executions
GNU General Public License v3.0
35 stars 16 forks source link

Cluster rewrite and integration #56

Open espg opened 1 month ago

espg commented 1 month ago

Overview

Replaces the clustering method for splitting up GAMIT runs when the station count is too high for single pass processing. This PR removes about 500 lines of code, and simplifies the module wherever possible.

Installation

Note that for modularity, the overcluster functions are in a different repro, and are imported-- they can be installed from pypi using pip install overcluster.

Details

This PR leaves the if conditional from lines 83 to 137 unchanged, including all functions that touch that block-- i.e., recover_subnets(), add_missing_station(), and backbone_delauney(). Everything from the else conditional on lines 137-175 is heavily rewritten. This includes completely removing subnets_delauney() and tie_subnetworks() ; the function make_clusters() is kept as a function name, but shares no code with the prior version of the function. Both global_sel() and save_clusters() are no longer called after removing the prior functions, and have been deleted.

This should match the prior data structure API of data variables that get passed to the GAMIT.session; the clusters variable is still setup as lists of arrays that can be indexed by cluster number as in the final lines for the create_gamit_sessions() loop. The ties data variable is also maintained... for now (see below).

Testing

I haven't run this in the gamit environment, but am fairly confident that it will run without error for larger station runs. However, I'm less clear on the 'regional' cases involving low enough stations that we have a single cluster / no parallelization. I think that things will still work with make_clusters() generating an empty list for ties and single member variables for stations and labels list membership... however, I wouldn't be surprised if this needs some debugging, since I haven't tested that case locally.

Additional Things that could/should be addressed

espg commented 3 weeks ago

Added in the new BisectingQMeans method; also did some additional module clean up, although some deeper cleaning is needed for flake8 compliance... I'm not sure how the database literals can be split up without breaking things, so leaving them as is for now.