erdogant / clustimage

clustimage is a python package for unsupervised clustering of images.
https://erdogant.github.io/clustimage
Other
92 stars 8 forks source link

Trying to retrieve original file path names in the results #4

Closed Sid01123 closed 2 years ago

Sid01123 commented 2 years ago

Is there a way to be able to get the original pathnames of images used post fit_transform?

I am uploading images onto google colab, and reading them in by their filepaths as "/content/name_of_image", and then I wish to be able to recover this "/content/name_of_image" post running clustering.

I tried to extract pathnames per label using the following code, but seemed to be getting the filepaths for images created in a temporary directory as follows:

CODE Iloc = cl.results['labels']==0 cl.results['pathnames'][Iloc]

OUTPUT array(['/tmp/clustimage/8732cb41-c72d-4266-b164-ff453d68428a.png', '/tmp/clustimage/440fecd8-8a9c-49a0-b100-ccfb66107425.png', '/tmp/clustimage/3c9c38d8-4da9-4e4f-9130-d3836182b8c6.png', '/tmp/clustimage/85cc4848-1faf-44ea-ae4c-9d9d88bd6323.png', '/tmp/clustimage/6127e4fb-1c25-4ba9-8d68-56ef482e3db4.png', '/tmp/clustimage/abcf85e0-af1a-48f1-8861-122122b64e32.png', '/tmp/clustimage/275bbde0-394d-4ba4-b4d0-1c67da323c8b.png', '/tmp/clustimage/30b62285-2628-45c0-86b2-fea305cb8db3.png', '/tmp/clustimage/c47a6867-3c8f-480c-a7bd-b3e7ec4ba334.png', '/tmp/clustimage/da5c17fc-de2a-4375-b03c-066a0904428a.png'], dtype='<U56')

I wish to get the output as the original filenames that were in the pathnames list.

erdogant commented 2 years ago

Can you show with an example how this occurs? When I try the flowers example, it stores the filenames and paths correctly. The unique identifiers are only used if a data matrix is given as an input.

from clustimage import Clustimage
cl = Clustimage(method='pca', embedding
g='umap')
# Import data
Xlist = cl.import_example(data='flowers')
# Import data in a standardized manner
X = cl.import_data(Xlist)

X.keys()
dict_keys(['img', 'feat', 'xycoord', 'pathnames', 'labels', 'url', 'filenames'])
print(X['filenames'][0:5])
# array(['0001.png', '0002.png', '0003.png', '0004.png', '0005.png'],

What I can do for the datamatrix, is use the index names of a pandas dataframe for naming. In that way you can control the naming as you wish.

erdogant commented 2 years ago

I added this functionality the functionality to read pandas dataframes. Update with: pip install -U clustimage

Example:

from clustimage import Clustimage
import pandas as pd
import numpy as np

# Initialize
cl = Clustimage()

# Import data
Xraw = cl.import_example(data='mnist')

print(Xraw)
# array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
#        [ 0.,  0.,  0., ..., 10.,  0.,  0.],
#        [ 0.,  0.,  0., ..., 16.,  9.,  0.],
#        ...,
#        [ 0.,  0.,  1., ...,  6.,  0.,  0.],
#        [ 0.,  0.,  2., ..., 12.,  0.,  0.],
#        [ 0.,  0., 10., ..., 12.,  1.,  0.]])

filenames = list(map(lambda x: str(x) + '.png', np.arange(0, Xraw.shape[0])))
Xraw = pd.DataFrame(Xraw, index=filenames)

print(Xraw)
#            0    1     2     3     4     5   ...   58    59    60    61   62   63
# 0.png     0.0  0.0   5.0  13.0   9.0   1.0  ...  6.0  13.0  10.0   0.0  0.0  0.0
# 1.png     0.0  0.0   0.0  12.0  13.0   5.0  ...  0.0  11.0  16.0  10.0  0.0  0.0
# 2.png     0.0  0.0   0.0   4.0  15.0  12.0  ...  0.0   3.0  11.0  16.0  9.0  0.0
# 3.png     0.0  0.0   7.0  15.0  13.0   1.0  ...  7.0  13.0  13.0   9.0  0.0  0.0
# 4.png     0.0  0.0   0.0   1.0  11.0   0.0  ...  0.0   2.0  16.0   4.0  0.0  0.0
#       ...  ...   ...   ...   ...   ...  ...  ...   ...   ...   ...  ...  ...
# 1792.png  0.0  0.0   4.0  10.0  13.0   6.0  ...  2.0  14.0  15.0   9.0  0.0  0.0
# 1793.png  0.0  0.0   6.0  16.0  13.0  11.0  ...  6.0  16.0  14.0   6.0  0.0  0.0
# 1794.png  0.0  0.0   1.0  11.0  15.0   1.0  ...  2.0   9.0  13.0   6.0  0.0  0.0
# 1795.png  0.0  0.0   2.0  10.0   7.0   0.0  ...  5.0  12.0  16.0  12.0  0.0  0.0
# 1796.png  0.0  0.0  10.0  14.0   8.0   1.0  ...  8.0  12.0  14.0  12.0  1.0  0.0

# Fit and transform data
results = cl.fit_transform(Xraw)

print(results['filenames'])
# array(['0.png', '1.png', '2.png', ..., '1794.png', '1795.png', '1796.png'],
Sid01123 commented 2 years ago

Dear Mr. Taskesen,

Thank you so much for taking the time to add this update. I shall try It out and let you know!

Best, Sid

On Thu, Jun 9, 2022 at 10:25 AM Erdogan Taskesen @.***> wrote:

I added this functionality the functionality to read pandas dataframes. Update with: pip install -U clustimage

Example:

from clustimage import Clustimage import pandas as pd import numpy as np

Initialize

cl = Clustimage()

Import data

Xraw = cl.import_example(data='mnist')

print(Xraw)

array([[ 0., 0., 5., ..., 0., 0., 0.],

[ 0., 0., 0., ..., 10., 0., 0.],

[ 0., 0., 0., ..., 16., 9., 0.],

...,

[ 0., 0., 1., ..., 6., 0., 0.],

[ 0., 0., 2., ..., 12., 0., 0.],

[ 0., 0., 10., ..., 12., 1., 0.]])

filenames = list(map(lambda x: str(x) + '.png', np.arange(0, Xraw.shape[0]))) Xraw = pd.DataFrame(Xraw, index=filenames)

print(Xraw)

0 1 2 3 4 5 ... 58 59 60 61 62 63

0.png 0.0 0.0 5.0 13.0 9.0 1.0 ... 6.0 13.0 10.0 0.0 0.0 0.0

1.png 0.0 0.0 0.0 12.0 13.0 5.0 ... 0.0 11.0 16.0 10.0 0.0 0.0

2.png 0.0 0.0 0.0 4.0 15.0 12.0 ... 0.0 3.0 11.0 16.0 9.0 0.0

3.png 0.0 0.0 7.0 15.0 13.0 1.0 ... 7.0 13.0 13.0 9.0 0.0 0.0

4.png 0.0 0.0 0.0 1.0 11.0 0.0 ... 0.0 2.0 16.0 4.0 0.0 0.0

... ... ... ... ... ... ... ... ... ... ... ... ...

1792.png 0.0 0.0 4.0 10.0 13.0 6.0 ... 2.0 14.0 15.0 9.0 0.0 0.0

1793.png 0.0 0.0 6.0 16.0 13.0 11.0 ... 6.0 16.0 14.0 6.0 0.0 0.0

1794.png 0.0 0.0 1.0 11.0 15.0 1.0 ... 2.0 9.0 13.0 6.0 0.0 0.0

1795.png 0.0 0.0 2.0 10.0 7.0 0.0 ... 5.0 12.0 16.0 12.0 0.0 0.0

1796.png 0.0 0.0 10.0 14.0 8.0 1.0 ... 8.0 12.0 14.0 12.0 1.0 0.0

Or all in one run

results = cl.fit_transform(Xraw)

print(results['filenames'])

array(['0.png', '1.png', '2.png', ..., '1794.png', '1795.png', '1796.png'],

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_erdogant_clustimage_issues_4-23issuecomment-2D1151187795&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=D6W32UT11SWv3cCY-ZP9mPTas-ek59iSXpK9UYl1RaY&m=WhDAnk5ke6NJMlrh-fffAcVEv32VK7AnHw873swd82hTuVd5GgiwyBYA6JAZCljF&s=pGUi3Pn24RqdnaS_Vx33Xr3riARemSvRd6cmhTsr5d8&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ATKYKE3OLHGWYLY2I4PEXPDVOH5EXANCNFSM5X7JH7RA&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=D6W32UT11SWv3cCY-ZP9mPTas-ek59iSXpK9UYl1RaY&m=WhDAnk5ke6NJMlrh-fffAcVEv32VK7AnHw873swd82hTuVd5GgiwyBYA6JAZCljF&s=kB6g7cPrLfl3WXv-2a14KQQDB8wPwkaiEuDgbeueI0s&e= . You are receiving this because you authored the thread.Message ID: @.***>

erdogant commented 2 years ago

I am closing this one. Re-open this issue if required.