SpatialHackathon / SpaceHack2023

MIT No Attribution
15 stars 3 forks source link

Matched label returns `NaN` in metric calculation #215

Open Jieran-S opened 9 months ago

Jieran-S commented 9 months ago

https://github.com/SpatialHackathon/SpaceHack2023/blob/2e81727c99ddd5171d403915834d1474820384d2/metric/jaccard/jaccard.py#L60-L63

Some metrics(MCC, Jaccard) require matched labels, if the labels are not pre-matched, the script will implement a matching algorithm (above). But when no. of domain label > no. of ground truth label, the resulted domains object has many NaN, leading to downstream error.

Can any metric people look into it and propose a potential fix?

shdam commented 9 months ago

Could this be fixed with pd.crosstab(dropna = False)? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html

Also, isn't the intention to prevent over-clustering with a resolution optimization or similar, which would prevent no. of domain label > no. of ground truth label to be true?

Jieran-S commented 9 months ago

Yea agree...The issue also arise from model who dont convert resolution to n_cluster. But in case we want to investigate robustness of the clustering methods in the future it would be good to have this option imo