The flow chart of Multi-Sub: Multi-Sub obtains a desired clustering based on the subspace spanned by reference words provided by GPT-4 using users' high-level interest. |
The project is organized as follows:
.
├── clip/
├── dataset/ # Contains datasets for training and evaluation
│ ├── fruit/ # Dataset for Fruit (Please download the dataset via the link provided in the Datasets section, then extract it and place corresponding folders in the specified directory.)
│ │ ├── color/ # Sub-dataset for fruit color
│ │ ├── instance/ # Sub-dataset for fruit instances
│ │ └── species/ # Sub-dataset for fruit species
│ ├── cifar10/ # Dataset for CIFAR-10 (Please download the dataset via the link provided in the Datasets section, then extract it and place corresponding folders in the specified directory.)
│ │ ├── type/ # Sub-dataset for CIFAR-10 type clustering (e.g., transportation, animals)
│ │ └── environment/ # Sub-dataset for CIFAR-10 environment clustering (e.g., land, air, water)
├── gpt.py # Sends a prompt to OpenAI's GPT model and retrieves generated reference words
├── main.py # Main script to run training and evaluation
├── parse.py # Argument parsing for command-line execution
├── README.md # This is the README file
├── requirements.txt # Dependencies are required for running the project
└── setup.py # Installation setup
To run this project, ensure you have the following dependencies installed:
pip install -r requirements.txt
Please refer to Fruit and CIFAR-10 to download datasets, and create a dataset directory according to the folder structure and place the datasets in it.
Please change dataset_path_dict to adapt to different datasets
python main.py
Please cite our paper if you use this code in your own work:
@misc{yao2024customizedmultipleclusteringmultimodal,
title={Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning},
author={Jiawei Yao and Qi Qian and Juhua Hu},
year={2024},
eprint={2411.03978},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2411.03978},
}