Closed Sid01123 closed 2 years ago
Can you show with an example how this occurs? When I try the flowers example, it stores the filenames and paths correctly. The unique identifiers are only used if a data matrix is given as an input.
from clustimage import Clustimage
cl = Clustimage(method='pca', embedding
g='umap')
# Import data
Xlist = cl.import_example(data='flowers')
# Import data in a standardized manner
X = cl.import_data(Xlist)
X.keys()
dict_keys(['img', 'feat', 'xycoord', 'pathnames', 'labels', 'url', 'filenames'])
print(X['filenames'][0:5])
# array(['0001.png', '0002.png', '0003.png', '0004.png', '0005.png'],
What I can do for the datamatrix, is use the index names of a pandas dataframe for naming. In that way you can control the naming as you wish.
I added this functionality the functionality to read pandas dataframes.
Update with: pip install -U clustimage
Example:
from clustimage import Clustimage
import pandas as pd
import numpy as np
# Initialize
cl = Clustimage()
# Import data
Xraw = cl.import_example(data='mnist')
print(Xraw)
# array([[ 0., 0., 5., ..., 0., 0., 0.],
# [ 0., 0., 0., ..., 10., 0., 0.],
# [ 0., 0., 0., ..., 16., 9., 0.],
# ...,
# [ 0., 0., 1., ..., 6., 0., 0.],
# [ 0., 0., 2., ..., 12., 0., 0.],
# [ 0., 0., 10., ..., 12., 1., 0.]])
filenames = list(map(lambda x: str(x) + '.png', np.arange(0, Xraw.shape[0])))
Xraw = pd.DataFrame(Xraw, index=filenames)
print(Xraw)
# 0 1 2 3 4 5 ... 58 59 60 61 62 63
# 0.png 0.0 0.0 5.0 13.0 9.0 1.0 ... 6.0 13.0 10.0 0.0 0.0 0.0
# 1.png 0.0 0.0 0.0 12.0 13.0 5.0 ... 0.0 11.0 16.0 10.0 0.0 0.0
# 2.png 0.0 0.0 0.0 4.0 15.0 12.0 ... 0.0 3.0 11.0 16.0 9.0 0.0
# 3.png 0.0 0.0 7.0 15.0 13.0 1.0 ... 7.0 13.0 13.0 9.0 0.0 0.0
# 4.png 0.0 0.0 0.0 1.0 11.0 0.0 ... 0.0 2.0 16.0 4.0 0.0 0.0
# ... ... ... ... ... ... ... ... ... ... ... ... ...
# 1792.png 0.0 0.0 4.0 10.0 13.0 6.0 ... 2.0 14.0 15.0 9.0 0.0 0.0
# 1793.png 0.0 0.0 6.0 16.0 13.0 11.0 ... 6.0 16.0 14.0 6.0 0.0 0.0
# 1794.png 0.0 0.0 1.0 11.0 15.0 1.0 ... 2.0 9.0 13.0 6.0 0.0 0.0
# 1795.png 0.0 0.0 2.0 10.0 7.0 0.0 ... 5.0 12.0 16.0 12.0 0.0 0.0
# 1796.png 0.0 0.0 10.0 14.0 8.0 1.0 ... 8.0 12.0 14.0 12.0 1.0 0.0
# Fit and transform data
results = cl.fit_transform(Xraw)
print(results['filenames'])
# array(['0.png', '1.png', '2.png', ..., '1794.png', '1795.png', '1796.png'],
Dear Mr. Taskesen,
Thank you so much for taking the time to add this update. I shall try It out and let you know!
Best, Sid
On Thu, Jun 9, 2022 at 10:25 AM Erdogan Taskesen @.***> wrote:
I added this functionality the functionality to read pandas dataframes. Update with: pip install -U clustimage
Example:
from clustimage import Clustimage import pandas as pd import numpy as np
Initialize
cl = Clustimage()
Import data
Xraw = cl.import_example(data='mnist')
print(Xraw)
array([[ 0., 0., 5., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 10., 0., 0.],
[ 0., 0., 0., ..., 16., 9., 0.],
...,
[ 0., 0., 1., ..., 6., 0., 0.],
[ 0., 0., 2., ..., 12., 0., 0.],
[ 0., 0., 10., ..., 12., 1., 0.]])
filenames = list(map(lambda x: str(x) + '.png', np.arange(0, Xraw.shape[0]))) Xraw = pd.DataFrame(Xraw, index=filenames)
print(Xraw)
0 1 2 3 4 5 ... 58 59 60 61 62 63
0.png 0.0 0.0 5.0 13.0 9.0 1.0 ... 6.0 13.0 10.0 0.0 0.0 0.0
1.png 0.0 0.0 0.0 12.0 13.0 5.0 ... 0.0 11.0 16.0 10.0 0.0 0.0
2.png 0.0 0.0 0.0 4.0 15.0 12.0 ... 0.0 3.0 11.0 16.0 9.0 0.0
3.png 0.0 0.0 7.0 15.0 13.0 1.0 ... 7.0 13.0 13.0 9.0 0.0 0.0
4.png 0.0 0.0 0.0 1.0 11.0 0.0 ... 0.0 2.0 16.0 4.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
1792.png 0.0 0.0 4.0 10.0 13.0 6.0 ... 2.0 14.0 15.0 9.0 0.0 0.0
1793.png 0.0 0.0 6.0 16.0 13.0 11.0 ... 6.0 16.0 14.0 6.0 0.0 0.0
1794.png 0.0 0.0 1.0 11.0 15.0 1.0 ... 2.0 9.0 13.0 6.0 0.0 0.0
1795.png 0.0 0.0 2.0 10.0 7.0 0.0 ... 5.0 12.0 16.0 12.0 0.0 0.0
1796.png 0.0 0.0 10.0 14.0 8.0 1.0 ... 8.0 12.0 14.0 12.0 1.0 0.0
Or all in one run
results = cl.fit_transform(Xraw)
print(results['filenames'])
array(['0.png', '1.png', '2.png', ..., '1794.png', '1795.png', '1796.png'],
— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_erdogant_clustimage_issues_4-23issuecomment-2D1151187795&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=D6W32UT11SWv3cCY-ZP9mPTas-ek59iSXpK9UYl1RaY&m=WhDAnk5ke6NJMlrh-fffAcVEv32VK7AnHw873swd82hTuVd5GgiwyBYA6JAZCljF&s=pGUi3Pn24RqdnaS_Vx33Xr3riARemSvRd6cmhTsr5d8&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ATKYKE3OLHGWYLY2I4PEXPDVOH5EXANCNFSM5X7JH7RA&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=D6W32UT11SWv3cCY-ZP9mPTas-ek59iSXpK9UYl1RaY&m=WhDAnk5ke6NJMlrh-fffAcVEv32VK7AnHw873swd82hTuVd5GgiwyBYA6JAZCljF&s=kB6g7cPrLfl3WXv-2a14KQQDB8wPwkaiEuDgbeueI0s&e= . You are receiving this because you authored the thread.Message ID: @.***>
I am closing this one. Re-open this issue if required.
Is there a way to be able to get the original pathnames of images used post fit_transform?
I am uploading images onto google colab, and reading them in by their filepaths as "/content/name_of_image", and then I wish to be able to recover this "/content/name_of_image" post running clustering.
I tried to extract pathnames per label using the following code, but seemed to be getting the filepaths for images created in a temporary directory as follows:
CODE Iloc = cl.results['labels']==0 cl.results['pathnames'][Iloc]
OUTPUT array(['/tmp/clustimage/8732cb41-c72d-4266-b164-ff453d68428a.png', '/tmp/clustimage/440fecd8-8a9c-49a0-b100-ccfb66107425.png', '/tmp/clustimage/3c9c38d8-4da9-4e4f-9130-d3836182b8c6.png', '/tmp/clustimage/85cc4848-1faf-44ea-ae4c-9d9d88bd6323.png', '/tmp/clustimage/6127e4fb-1c25-4ba9-8d68-56ef482e3db4.png', '/tmp/clustimage/abcf85e0-af1a-48f1-8861-122122b64e32.png', '/tmp/clustimage/275bbde0-394d-4ba4-b4d0-1c67da323c8b.png', '/tmp/clustimage/30b62285-2628-45c0-86b2-fea305cb8db3.png', '/tmp/clustimage/c47a6867-3c8f-480c-a7bd-b3e7ec4ba334.png', '/tmp/clustimage/da5c17fc-de2a-4375-b03c-066a0904428a.png'], dtype='<U56')
I wish to get the output as the original filenames that were in the pathnames list.