Previously hdbscan assignments were directly split into new groups when exporting to a new project.
This can lead to the "noise" being reassigned to the original class (even though noise contains instances of the other sub-classes).
Solution:
In line with B-SOiD, we train a Randomforest on the assignments and predict the sub-class identity afterward before assigning new labels to the data set. This leads to reassigning each label of the selected class to the predicted class by the new model.
The predicted labels are then exported with a subselection of sub-classes of interest during the save process.
Thoughts:
Although not entirely intuitive for users, this is the only way to reliably export clusters as we do not trust the assignments for individual identities but instead use the clustering as an orientation.
@runninghsus Pls review if this is the intended behavior from B-SOiD
Previously hdbscan assignments were directly split into new groups when exporting to a new project. This can lead to the "noise" being reassigned to the original class (even though noise contains instances of the other sub-classes).
Solution: In line with B-SOiD, we train a Randomforest on the assignments and predict the sub-class identity afterward before assigning new labels to the data set. This leads to reassigning each label of the selected class to the predicted class by the new model. The predicted labels are then exported with a subselection of sub-classes of interest during the save process.
Thoughts: Although not entirely intuitive for users, this is the only way to reliably export clusters as we do not trust the assignments for individual identities but instead use the clustering as an orientation.
@runninghsus Pls review if this is the intended behavior from B-SOiD