This revamps the clustering algorithm to allow the following:
optionally define MaxClusterSize in the config file with default set to 64, corresponding to a maximum of 64 native resolution elements in a single cluster
optionally define the ClusteringThreshold, which represents the total level of information content a cluster must have before being added to the final state vector. Defaults to total_DOFS/desired_number_of_elements. This effectively allows users to enforce a minimum level of information content per cluster, allowing users to reduce the size distribution between the smallest clusters and background elements (of size MaxClusterSize). This is useful when many elements are aggregated into background elements, due to not enough clusters to allocate optimally, leading to background elements having very high information content, causing large swings in emissions.
simplifies the clustering algorithm to remove the notion of pregenerated clustering pairs based on the list of estimated sensitivities(but with no spatial information). Instead, we iterate through each possible aggregation from highest resolution to lowest, and add clusters with total sensitivity above the ClusteringThreshold. This is simpler algorithmically and also generates a more optimal clustering. The downside is that it is slightly slower, but the difference is only a few minutes and minimal in comparison to the length of the full of inversion.
Name and Institution (Required)
Name: Lucas Estrada Institution: Harvard ACMG
Describe the update
This revamps the clustering algorithm to allow the following:
MaxClusterSize
in the config file with default set to 64, corresponding to a maximum of 64 native resolution elements in a single clusterClusteringThreshold
, which represents the total level of information content a cluster must have before being added to the final state vector. Defaults tototal_DOFS/desired_number_of_elements
. This effectively allows users to enforce a minimum level of information content per cluster, allowing users to reduce the size distribution between the smallest clusters and background elements (of sizeMaxClusterSize
). This is useful when many elements are aggregated into background elements, due to not enough clusters to allocate optimally, leading to background elements having very high information content, causing large swings in emissions.ClusteringThreshold
. This is simpler algorithmically and also generates a more optimal clustering. The downside is that it is slightly slower, but the difference is only a few minutes and minimal in comparison to the length of the full of inversion.