erdogant / clustimage

clustimage is a python package for unsupervised clustering of images.
https://erdogant.github.io/clustimage
Other
92 stars 8 forks source link

Memory Error during import_dat #18

Open ntokenl opened 1 year ago

ntokenl commented 1 year ago

Hi,

i am encountering a Memory error during the import_data step. After loading the images it throws a Memory Error. Anyway i can figure out what is the issue.

Traceback (most recent call last): File "/home/stg/prod/combine_clustimage.py", line 56, in results = cl.fit_transform(targetdir) File "/home/stg/.local/lib/python3.10/site-packages/clustimage/clustimage.py", line 352, in fittransform = self.import_data(X, black_list=black_list) File "/home/stg/.local/lib/python3.10/site-packages/clustimage/clustimage.py", line 992, in import_data X = self.preprocessing(Xraw['pathnames'], grayscale=self.params['cv2_imread_colorscale'], dim=self.params['dim'], flatten=flatten) File "/home/stg/.local/lib/python3.10/site-packages/clustimage/clustimage.py", line 806, in preprocessing img, imgOK = zip(*imgs) MemoryError

JoeyChallita commented 1 year ago

How many images are there in your directory ?

ntokenl commented 1 year ago

i tried with 30000-50000 images,

seem like the issue is with scipy lapack and the specific error can be reproduced via this a = np.ones((30000,30000)) u,s,vh = svd(a) documented here - https://github.com/scipy/scipy/issues/10337

managed to find some solutions,

the workaround is to switch to lapack driver from 'gesdd' to 'gesvd' in scipy "/usr/lib/python3/dist-packages/scipy/linalg/_decomp_svd.py"

another way i am trying is to use the intel-scipy python modules via pip the latest seems to be incompatible with clustimage, it states the mismatch in the version of 'numba' installing the release candidate of numba seems to make it work again.

not to sure about the accuracy or performance after these change but clustimage is no longer reporting the error.

erdogant commented 1 year ago

Great. Nice to hear that you find a solution. However, I am not sure why it states that there is a mismatch in numba version. It is not directly used in clustimage but maybe it may have been imported in another package.