algo-hhu / FLSpp

Implementation of the FLS++ algorithm for K-Means clustering.
https://www.algo.hhu.de/
MIT License
4 stars 1 forks source link

If this gets printed, the generated number was too big! #22

Closed zoubohao closed 2 months ago

zoubohao commented 2 months ago

Hi authors

I got this message during running and I found that the code was terminated. There are many points in my data. I do not know if there are any issues during running the code

Best

melanieschmidt commented 2 months ago

Dear zoubohao, many thanks for pointing out the error! We tried to recreate it, but we cannot make this error occur. It shouldn't be influenced by the number of points. Could you share your data set with us? Maybe you can make it occur with a small subset? We are happy to fix the error and we know the code line where it happens, but we have no idea how to make it occur. Many thanks Melanie

melanieschmidt commented 2 months ago

How big is the data set? We tested it with up to 1.000.000 points today but could not reproduce the issue. Anything else that could be special about the data which we could reproduce?

zoubohao commented 2 months ago

Dear author:

The following is my code:

import pickle
from flspp import FLSpp

def readPickle(readPath: str) -> object:
    with open(readPath, "rb") as rh:
        obj = pickle.load(rh)
    return obj

def foo(X):
    print("start cluster", len(X))
    cluster_model = FLSpp(8, local_search_iterations=15)
    cluster_model = cluster_model.fit(X)
    labels_ = cluster_model.labels_
    center_points = cluster_model.cluster_centers_

if __name__ == "__main__":
    file_path = "../flspp_x_8.pkl"
    X = readPickle(file_path)
    foo(X)

flspp_x_8.pkl is a pickle file and it is a list that contains vectors. I attached this file with zip compression.

Thank you for replying and solving this. flspp_x_8.zip

schmidthhu commented 2 months ago

We did some debugging over here: I could not reproduce your error message "If this gets printed, the generated number was too big!", however, I did get a segmentation fault when I ran your code.

The issue is that your pickle file holds a float32 array, but we expect float64. Could you try again as this:

...
if __name__ == "__main__":
    file_path = "../flspp_x_8.pkl"
    X = readPickle(file_path)
    X = X.astype(np.float64, copy=False) # new: convert to float64
    foo(X)

and report back if that works? :)

schmidthhu commented 2 months ago

I also pushed a fix in commit b9a6f4e that should fix the float32 vs. float64 issue. Does your original code run now?

zoubohao commented 2 months ago

Thank you very much. It worked!!!

zoubohao commented 2 months ago

Thank you very much. It worked !!!