MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
187 stars 16 forks source link

Exemplar dict is not serializable #13

Open trifle opened 2 years ago

trifle commented 2 years ago

Hi, thanks for your awesome libraries.

Just a short question: In this line:

https://github.com/MaartenGr/Concept/blob/d270607d6ea4d789a42d54880ab4a0c977bb69ce/concept/_model.py#L304

you're casting the numpy int64s to integers, presumably so they can be used as indexes? In any case, the cluster keys remain np.int64. This means the whole dict cannot be serialized (as json doesn't know how to handle numpy data types).

My suggestion would be to int() the keys as well to make this a bit less perplexing. But I'm not sure if you rely on the indexes being np.int64 in some other place?

MaartenGr commented 2 years ago

Thank you for finding this! Did you encounter any issues with saving the model due to keeping the cluster keys as np.int64? Or was there another use case for which this is an issue?

trifle commented 2 years ago

Hi, I've simply tried to json.dump() the cluster keys in an attempt to store all the gritty details from the analysis to inspect and save for later. What surprised me was that even default=str failed to convince json to serialize the dict. That's when I found out that the default apparently (and somewhat sensibly) only applies to values, and having np.int64 as keys is not within the json spec.

MaartenGr commented 2 years ago

Ah, right, that makes sense! I am not entirely sure but I do not think it should be any problem to cast them as regular int in that specific function. It seems that cluster in representative_images[cluster] is derived from items in self.cluster_labels which I think can be regular int.