requirements:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install --upgrade pip setuptools wheel
conda create -n Aesmamba python=3.8
conda activate Aesmamba
git clone https://github.com/state-spaces/mamba.git
cd mamba
MAMBA_FORCE_BUILD=TRUE pip install .
cd ../Aesmamba
pip install -r requirements.txt
cd AesMamba_v && python train_viaa.py
cd AesMamba_m && python train_miaa.py
cd AesMamba_f && python train_multi_attr_add_balce.py
cd AesMamba_p && python multi_attr_pred_model_add_human_attr.py.py
You can change the config in their corresponding .py file. We will combine the four tasks in our later works.
In our code, we classified the image by its score in each dataset. We uploaded some of their csv files. As for other datasets, we only provide the method of classification because the csv file is large.
Visual Encoder:vmamba tiny and Text Encoder:bert base We use old version of vmamba, the ckpt is here:
Link: https://pan.baidu.com/s/1REVTVD4w20G7lKnIM-Btjg Passward: c1mk
Vmamba base and it's conda environment please ref https://github.com/MzeroMiko/VMamba