ExplainableML / DataDream

[ECCV 2024] Official repository for "DataDream: Few-shot Guided Dataset Generation"
24 stars 3 forks source link

CLASS_IDX and SPLIT_IDX mean? #4

Open ha1ha2hahaha opened 3 weeks ago

ha1ha2hahaha commented 3 weeks ago

I dont know how to how to set the CLASS_IDX and SPLIT_IDX,can you give a example,please

jabader97 commented 2 weeks ago

Hello,

The flag --target_class_idx is for parallelizing the generation across different GPUs, as the process happens separately for each class in the cls-wise version of training the DataDream weights, and for both cls- and dset-wise generation. i.e. if you want to split N classes among M GPUs, this is how you can assign individual classes to GPUs.

e.g. to put class 0 on GPU 1, you could use CUDA_VISIBLE_DEVICES=1, accelerate launch datadream.py \ --target_class_idx=0 \ ...

To generate the full dataset, you would need to execute this code for each individual class target_class_idx = 0 - (N - 1).

To use CLASS_IDX, you would specify each class individually (this could be convenient if you are using slurm).

On the other hand, SPLIT_IDX provides a way to split the classes evenly among M available GPUs. In bash_run.sh, SET_SPLIT defines M (currently set to M = 5). It will allocate 1 / 5th of the jobs to a given GPU. e.g. if you have 100 classes and 5 GPUs, then bash bash_run.sh 2 0 would submit classes 0 - 19 to GPU 2. To generate the full dataset, you would need to run this for SPLIT_IDX = 0 - 4, with the desired GPUs

ha1ha2hahaha commented 2 weeks ago

Thank you so much for your reply and help. I would also like to ask if you are using the SD2.1 version, because I used the stabilityai/stable-diffusion-2-1-base in cars only have acc 91.07, stabilityai/stable-diffusion-2-1-base, stabilityai/stable-diffusion-2-1 or stabilityai/stable-diffusion-2-1-unclip, I really look forward to knowing the details of how you implemented it