Open drroe opened 1 year ago
Here https://github.com/Amber-MD/cpptraj/pull/1051#event-10450499445 it says "Calculate extended comparison similarity values for each trajectory frame." Is this the complementary similarity used to then find medoids and outliers in the trajectory?
Is this the complementary similarity used to then find medoids and outliers in the trajectory?
Yes - it's equivalent to the gen_sim_dict
routine from src/tools/esim_modules.py
in MDANCE.
gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame. To calculate the outliers and medoids, the function is calculate_comp_sim (in src/tools/bts.py). The complementary similarity does assign a number to every frame in a set, which can be used to rank the frames from high- to low-density.
gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame.
Yes, I understand that. Let me be more clear.
The ExtendedSimilarity::Comparison()
function is most like gen_sim_dict
. The ExtendedSimilarity::CalculateCompSim()
function (which is what the extendedcomp
command, Exec_ExtendedComparison class) is using under the hood is more like calculate_comp_sim
. Let me know if you have any more questions.
Sounds great! The functionality in bts.py is a bit more general, because it accommodates extended indices and MSD in a more general way, but this is perfect.
In collaboration with @ramirandaq @lexin-chen, expand the cluster analysis capabilities of cpptraj by adding clustering via extended similarity metrics (and more).
Some background reading:
https://link.springer.com/article/10.1186/s13321-021-00505-3
https://link.springer.com/article/10.1007/s10822-022-00444-7