The R script generates clusterings, and the slowest part of this is the generation of an atomic similarity matrix. We can stick this matrix into a data file, but I wasn't up to fighting with R to figure out how to serialize the data in a readable way.
We should find a way to put it on disk as a CSV file in one R script and read it in another. This could allow us to parallelize the clusterings (one script runs with one K, so snakemake can run them in parallel), though this isn't slow anyway.
Note that we should not export it to an Rdata file, because we can't do anything else with it. Serializing the data as text means in can be opened in excel for viewing or in python for passing along to other algorithms (like for #35). Even if we don't use it for anything interesting, it can be used by whoever picks up the work.
The R script generates clusterings, and the slowest part of this is the generation of an atomic similarity matrix. We can stick this matrix into a data file, but I wasn't up to fighting with R to figure out how to serialize the data in a readable way.
We should find a way to put it on disk as a CSV file in one R script and read it in another. This could allow us to parallelize the clusterings (one script runs with one K, so snakemake can run them in parallel), though this isn't slow anyway.
Note that we should not export it to an Rdata file, because we can't do anything else with it. Serializing the data as text means in can be opened in excel for viewing or in python for passing along to other algorithms (like for #35). Even if we don't use it for anything interesting, it can be used by whoever picks up the work.