This PR extracts a separate step to generate the atomic movement similarity matrix, so that we can (maybe) reuse it in other steps, and, more practically, change up our clustering algorithms and rerun them very cheaply.
We could also add an evaluation step for different clusterings to determine which are "best" in terms of variance reduction or something. Depends on whether we can find the time, but iterating will not be possible if we have to generate a new AMSM every time.
Once this PR is merged, the entire 02_intermediate/bio3d_geostas directory should probably be removed, plus the segmentation tsv files in 03_output, so that running a snakemake will regenerate the missing files.
This PR extracts a separate step to generate the atomic movement similarity matrix, so that we can (maybe) reuse it in other steps, and, more practically, change up our clustering algorithms and rerun them very cheaply.
We could also add an evaluation step for different clusterings to determine which are "best" in terms of variance reduction or something. Depends on whether we can find the time, but iterating will not be possible if we have to generate a new AMSM every time.
Once this PR is merged, the entire
02_intermediate/bio3d_geostas
directory should probably be removed, plus the segmentation tsv files in03_output
, so that running asnakemake
will regenerate the missing files.