Thanks for this implementation of the DBCV in Python. However, the results with this method don't match with the reference implementation in Matlab by Moulavi et al.
This is partly because your implementation treats outliers as a cluster, but even fixing this leads to completly different results. The first example dataset of the reference Implementation will give values of -0.2986 for your Implementation, 0.5074 for your implementation with the correct outlier processing and 0.6149 for the reference implementation.
I think these quite significant difference discourage from using this implementation in scientific contexts until this is fixed.
For anyone this might help, I came across another implementation. See this library that matches the Matlab implementation and an issue raised in that repo comparing their library to this one.
Hello,
Thanks for this implementation of the DBCV in Python. However, the results with this method don't match with the reference implementation in Matlab by Moulavi et al. This is partly because your implementation treats outliers as a cluster, but even fixing this leads to completly different results. The first example dataset of the reference Implementation will give values of -0.2986 for your Implementation, 0.5074 for your implementation with the correct outlier processing and 0.6149 for the reference implementation.
I think these quite significant difference discourage from using this implementation in scientific contexts until this is fixed.