MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
187 stars 16 forks source link

AttributeError: 'ConceptModel' object has no attribute 'image_cluster_df' #22

Open aysedeniz09 opened 1 year ago

aysedeniz09 commented 1 year ago

AttributeError: 'ConceptModel' object has no attribute 'image_cluster_df'

Reinstalled sklearn to pre 1.0 from this thread: https://github.com/MaartenGr/Concept/issues/19

still getting the error.

MaartenGr commented 1 year ago

Could you share your full code? Also which version do you have installed? I do not believe the model has any image_cluster_df attributes at all.

Lastly, it might be worthwhile to also checkout BERTopic since it has multi-modal topic modeling alongside a number of other features that were recently integrated.

aysedeniz09 commented 1 year ago

Thank you, I will check BERTopic see full code below. We had run it previously (January 2023) with image_cluster_df.

import os from PIL import UnidentifiedImageError, Image, ImageFile from tqdm import tqdm import numpy as np from concept import ConceptModel import dill as pickle import pandas as pd from datetime import datetime

os.chdir('/image_cluster') os.getcwd()

ImageFile.LOAD_TRUNCATED_IMAGES = True framedir = "dataPENCE150k/" timestamp = datetime.now().strftime("%Y%m_%d") pkl_path = os.path.join("/concept-pickled-viztopic-model", "pkl_concept_modelrun"+timestamp+".pkl") clustering_output_path = os.path.join("/image_cluster/concept-clustering-output", "clustering_outputrun"+timestamp+".csv") print("Running image clustering run timestamp {}".format(timestamp))

Create a list of frames from dir

print("Fetching frames from directory...") frames = [] excluded_frames=[] files = os.listdir(frame_dir) not_found_count = 0 for image in tqdm(files, total=len(files)):       if image.endswith(".jpg"):             try:                   Image.open(os.path.join(frame_dir,image))                   frames.append(os.path.join(frame_dir, image))             except UnidentifiedImageError:                   print("Failed to open the following image: ", image)                   excluded_frames.append(image)             except FileNotFoundError:                   not_found_count+=1

listing frames names

frame_names = [os.path.basename(frame) for frame in frames]

Concept modeling

print("Building concept model...") concept_model = ConceptModel() concepts = concept_model.fit_transform(frames,image_names=frame_names)

concept_model.save("conceptsPENCE150k_v2")

write results

print("Output results...")

concept_model.image_cluster_df.to_csv(clustering_output_path, index=False)

Pickle model

print("Pickling concept model...")

with open('concepts_v2', 'wb') as cm:

#pickle.dump(concepts, cm)

Pickle model

print("Pickling concept model again...") pkl_file = open(pkl_path, 'wb') pickle.dump(concept_model, pkl_file) pkl_file.close() print("Done pickling...")

Create a directory to save cluster images

cluster_dir = "clusterimages" + timestamp os.makedirs(cluster_dir, exist_ok=True)

Iterate over each cluster

for cluster_id in concept_model.image_cluster_df['cluster_id'].unique():

Exclude the cluster with label -1

if cluster_id == -1:
    continue

# Create a subdirectory for each cluster
cluster_subdir = os.path.join(cluster_dir, f"cluster_{cluster_id}")
os.makedirs(cluster_subdir, exist_ok=True)

# Get image names belonging to the cluster
cluster_images = concept_model.image_cluster_df.loc[concept_model.image_cluster_df['cluster_id'] == cluster_id, 'image_name']

# Save images to the cluster subdirectory
for image_name in cluster_images:
    image_path = os.path.join(frame_dir, image_name)
    try:
        image = Image.open(image_path)
        save_path = os.path.join(cluster_subdir, image_name)
        image.save(save_path)
    except UnidentifiedImageError:
        print("Failed to save the image:", image_name)

list of images that failed to open

print("=======================================") print("Images that failed to load or were corrupted") for frame in excluded_frames:       print(frame)

print("") print("Summary Report") print("Total number of frames successfully clustered: {}".format(len(frames))) print("Total number of excluded frames due error in file (corrupt file): {}".format(len(excluded_frames))) print("Total number of frames with file not found (frames were removed from parler server): {}".format(not_found_count)) print("")


From: Maarten Grootendorst @.> Sent: Thursday, June 22, 2023 11:35 AM To: MaartenGr/Concept @.> Cc: Ayse Deniz Lokmanoglu @.>; Author @.> Subject: Re: [MaartenGr/Concept] AttributeError: 'ConceptModel' object has no attribute 'image_cluster_df' (Issue #22)

Could you share your full code? Also which version do you have installed? I do not believe the model has any image_cluster_df attributes at all.

Lastly, it might be worthwhile to also checkout BERTopichttps://urldefense.com/v3/__https://maartengr.github.io/BERTopic/getting_started/multimodal/multimodal.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!V5849p3wJYzwb7IhEghg1PS1jJj-Kpm0-9EeB87IOTJXIdiFt47FPjeV7dCFKw8LmVFjvvgL6B6lnqTcK4NEOjyxg57hAYx4JmAG$ since it has multi-modal topic modeling alongside a number of other features that were recently integrated.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/MaartenGr/Concept/issues/22*issuecomment-1602984542__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!V5849p3wJYzwb7IhEghg1PS1jJj-Kpm0-9EeB87IOTJXIdiFt47FPjeV7dCFKw8LmVFjvvgL6B6lnqTcK4NEOjyxg57hAdk1bXkM$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AN6TK72VQSIIYB76LINAKJDXMRX5JANCNFSM6AAAAAAZQOO4HA__;!!Dq0X2DkFhyF93HkjWTBQKhk!V5849p3wJYzwb7IhEghg1PS1jJj-Kpm0-9EeB87IOTJXIdiFt47FPjeV7dCFKw8LmVFjvvgL6B6lnqTcK4NEOjyxg57hAVOCiUzC$. You are receiving this because you authored the thread.Message ID: @.***>

MaartenGr commented 1 year ago

If image_cluster_df was still working then, then it would definitely be a version issue. I believe one of the previous version still had that attribute but was removed later on. If you do version control of the package version you used then and use the same version now, then it should work.