alhojel / visual_task_vectors

33 stars 0 forks source link

Corrupted visual results in the evaluation. #2

Closed Pandint closed 1 week ago

Pandint commented 4 months ago

Hi! I followed your instructions in the README file and tried to reproduce the segmentation results. However, the visual results seem to be corrupted like this:

0_0_0_official

Here are my steps to obtain the results and the training logs. python collect_attention_heads.py --model mae_vit_large_patch16 --base_dir path/to/pascal-5i --output_dir ./output_dir_official --ckpt ./checkpoint-3400.pth --device cuda --num_collections 100

python reinforce_train.py --model mae_vit_large_patch16 --base_dir path/to/pascal-5i --output_dir ./output_dir_official --ckpt ./checkpoint-3400.pth --split 0 --device cuda --task 0

python reinforce_evaluate.py --model mae_vit_large_patch16 --base_dir path/to/pascal-5i --ckpt ./checkpoint-3400.pth --split 0 --device cuda --setup official --output_dir ./output_dir_official --task 0 --load_model path/to/bernoullis_0_1_0_0_10_0.1_-1.0_best.pkl --save_images 1

{'iter': 0, 'train_loss': -0.19377514719963074, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 0, 'eval_loss': 0.19443333597891793, 'eval_patch_count': 8319, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 10, 'train_loss': -0.2457774430513382, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 20, 'train_loss': -0.27752330899238586, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 30, 'train_loss': -0.2990878224372864, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 40, 'train_loss': -0.29916679859161377, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 50, 'train_loss': -0.3054044842720032, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 50, 'eval_loss': 0.31587401366203416, 'eval_patch_count': 10255, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 60, 'train_loss': -0.3159942030906677, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 70, 'train_loss': -0.32910600304603577, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 80, 'train_loss': -0.3447962999343872, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 90, 'train_loss': -0.3631593585014343, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 100, 'train_loss': -0.37782636284828186, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 100, 'eval_loss': 0.43920151398657364, 'eval_patch_count': 10972, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 110, 'train_loss': -0.3847891688346863, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 120, 'train_loss': -0.39654892683029175, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 130, 'train_loss': -0.4040229916572571, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 140, 'train_loss': -0.4100201725959778, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 150, 'train_loss': -0.42192402482032776, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 150, 'eval_loss': 0.4959391429968064, 'eval_patch_count': 10881, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 160, 'train_loss': -0.4282248914241791, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 170, 'train_loss': -0.42880016565322876, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 180, 'train_loss': -0.4342617392539978, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 190, 'train_loss': -0.44159871339797974, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 200, 'train_loss': -0.4437841475009918, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 200, 'eval_loss': 0.4878141303974375, 'eval_patch_count': 11434, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 210, 'train_loss': -0.44643640518188477, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 220, 'train_loss': -0.4497097432613373, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 230, 'train_loss': -0.44859352707862854, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 240, 'train_loss': -0.45353788137435913, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 250, 'train_loss': -0.4560166895389557, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 250, 'eval_loss': 0.5194110590489391, 'eval_patch_count': 11633, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 260, 'train_loss': -0.460170179605484, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 270, 'train_loss': -0.460139662027359, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 280, 'train_loss': -0.4652172923088074, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 290, 'train_loss': -0.45897823572158813, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 300, 'train_loss': -0.47232604026794434, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 300, 'eval_loss': 0.531175637637189, 'eval_patch_count': 11119, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 310, 'train_loss': -0.473876416683197, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 320, 'train_loss': -0.47269487380981445, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 330, 'train_loss': -0.4757246971130371, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 340, 'train_loss': -0.48187583684921265, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 350, 'train_loss': -0.48199811577796936, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 350, 'eval_loss': 0.5281211539076603, 'eval_patch_count': 11354, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 360, 'train_loss': -0.4863167405128479, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 370, 'train_loss': -0.4870050847530365, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 380, 'train_loss': -0.48828834295272827, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 390, 'train_loss': -0.49215221405029297, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 400, 'train_loss': -0.49300098419189453, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 400, 'eval_loss': 0.491961728184655, 'eval_patch_count': 11986, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 410, 'train_loss': -0.48984295129776, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 420, 'train_loss': -0.49729982018470764, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 430, 'train_loss': -0.49370089173316956, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 440, 'train_loss': -0.49439555406570435, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 450, 'train_loss': -0.502895712852478, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 450, 'eval_loss': 0.48731662322001174, 'eval_patch_count': 12338, 'lr': 0.1, 'init': -1.0, 'granularity': 1, 'batch_size': 320, 'reg_strength': 0, 'images_per_batch': 10, 'task': 0, 'split': 0}
{'iter': 460, 'train_loss': -0.4980536997318268, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 470, 'train_loss': -0.5013035535812378, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 480, 'train_loss': -0.5087719559669495, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
{'iter': 490, 'train_loss': -0.5076620578765869, 'lr': 0.1, 'init': -1.0, 'reg_strength': 0, 'restrict_area': 0, 'batch_size': 320, 'granularity': 1, 'total_train_images': 10, 'task': 0}
alhojel commented 1 month ago

Do all results appear corrupted or only a handful? Because the approach leverages activation editing, sometimes the outputs can have a corrupted appearance. You can try using a larger number of images to extract the mean activations and use more than only 10 examples in the reinforce_train step.

Pandint commented 1 week ago

Thank you for your response! I have tried to increase the number of images and use more examples, some results are improved!