choosehappy / CohortFinder

Intelligent data partitioning using quality control metrics
BSD 3-Clause Clear License
16 stars 4 forks source link

Compute and store selection_priority_index. #9

Open jacksonjacobs1 opened 2 hours ago

jacksonjacobs1 commented 2 hours ago

Compute an additional column in the CohortFinder results.tsv called selection_priority_index.

Looking at the cluster centers pick points such that the one in the center of mass is picked first, and then sequentially the one that is furthest away from the ones which are already picked

This essentially find an order of selection which maximizes diversity (i.e., two clusters next to each other are not picked sequentially)

If available, this index will be used by QuickAnnotator to suggest the order in which a dataset should be labeled.

jacksonjacobs1 commented 2 hours ago

Index should be computed directly after this line: https://github.com/choosehappy/CohortFinder/blob/bde0e5a2229a99028151704edb5acedc7b566ecd/cohortfinder/cohortfinder.py#L276