AndrewRadev / protein-runway

Integrated Bioinformatics Project
1 stars 0 forks source link

Cache amsm matrix #51

Closed AndrewRadev closed 6 days ago

AndrewRadev commented 1 week ago

This PR extracts a separate step to generate the atomic movement similarity matrix, so that we can (maybe) reuse it in other steps, and, more practically, change up our clustering algorithms and rerun them very cheaply.

We could also add an evaluation step for different clusterings to determine which are "best" in terms of variance reduction or something. Depends on whether we can find the time, but iterating will not be possible if we have to generate a new AMSM every time.

Once this PR is merged, the entire 02_intermediate/bio3d_geostas directory should probably be removed, plus the segmentation tsv files in 03_output, so that running a snakemake will regenerate the missing files.