Open Jieran-S opened 9 months ago
Could this be fixed with pd.crosstab(dropna = False)
? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html
Also, isn't the intention to prevent over-clustering with a resolution optimization or similar, which would prevent no. of domain label > no. of ground truth label
to be true?
Yea agree...The issue also arise from model who dont convert resolution to n_cluster. But in case we want to investigate robustness of the clustering methods in the future it would be good to have this option imo
https://github.com/SpatialHackathon/SpaceHack2023/blob/2e81727c99ddd5171d403915834d1474820384d2/metric/jaccard/jaccard.py#L60-L63
Some metrics(MCC, Jaccard) require matched labels, if the labels are not pre-matched, the script will implement a matching algorithm (above). But when no. of domain label > no. of ground truth label, the resulted
domains
object has manyNaN
, leading to downstream error.Can any metric people look into it and propose a potential fix?