Can you provide the best parameters for each dataset? - Githubissues

Hanzy1996 / CE-GZSL

Codes for the CVPR 2021 paper: Contrastive Embedding for Generalized Zero-Shot Learning

MIT License

91 stars 22 forks source link

Can you provide the best parameters for each dataset? #2

Open mrzhu666 opened 3 years ago

mrzhu666 commented 3 years ago

Can you provide the best parameters for each dataset?

mrzhu666 commented 3 years ago

Thank you very much if you can

Hanzy1996 commented 3 years ago

Hi, @mrzhu666 ! Thanks for your interest in our work!

We are collecting the parameters for each dataset, and we will release the parameters as soon as possible.

Best wishes!

WilliamYi96 commented 3 years ago

@Hanzy1996 Do you have a plan for when to release the best hyperparameters that can reproduce the reported results?

Hanzy1996 commented 3 years ago

I am sorry for the delay in releasing the parameters. Maybe I can firstly share with you some key parameters:

manualSeed：AWA1&2 (9182), FLO (806), SUN (4115) nz=attSize (AWA1/AWA2/CUB/SUN), nz=512 (FLO) syn_num：AWA1(1800) AWA2 (2400), FLO (600), SUN (100) nhF: AWA1&2(2048), FLO (1024), SUN (1024) ins_weight: AWA1&2(0.001), FLO (0.01), SUN (0.01) cls_weight: AWA1&2(0.001), FLO (0.01), SUN (0.01) lr: AWA1&2(1e-4), FLO (1e-4), SUN (5e-5) lr_decay_epoch: AWA1 (50), AWA2 (10), FLO&SUN (100) epochs: AWA1/AWA2 (about 130), CUB (about 450), FLO (about 750), and SUN (about 1000).

webcsm commented 3 years ago

@Hanzy1996 does nhza correspond to --nhF?

In manualSeed (second entry) you mean --syn_num, right?

Hanzy1996 commented 3 years ago

@webcsm Sorry for the mistake. Yes, much appreciation for your remind. I have corrected my comments.

webcsm commented 3 years ago

@Hanzy1996 you don't mention in the paper but in the code I see it's possible to run a preprocessing or standardization step. Did you use any of those to obtain the results?

Hanzy1996 commented 3 years ago

Hi, @webcsm . In the code, I only use the preprocessing step on all datasets and conduct the MinMax normalization on the visual features. Sorry for the missing in the paper.

webcsm commented 3 years ago

Thank you, @Hanzy1996 . No worries. I'm running one experiment on AWA1 with the parameters you provided.

webcsm commented 3 years ago

Hi @Hanzy1996 do you have an idea of the number of epochs you used? My first experiment with AWA1 showed that 2k epochs (the default value) was not enough. I'm resuming training now with 4k epochs to see if the generator converges.

These are the parameters I used:

INFO:root:Namespace(attSize=85, batch_size=4096, beta1=0.5, class_embedding='att', classifier_lr=0.001, cls_temp=0.1, cls_weight=0.001, critic_iter=5, cuda=True, dataroot='xlsa17', dataset='AWA1', embedSize=2048, gpus='0', gzsl=False, image_embedding='res101', ins_temp=0.1, ins_weight=0.001, lambda1=10, lr=0.0001, lr_dec_rate=0.99, lr_decay_epoch=100, manualSeed=3483, matdataset=True, nclass_all=50, nclass_seen=40, ndh=4096, nepoch=2000, ngh=4096, nhF=2048, nz=85, outzSize=512, preprocessing=True, resSize=2048, standardization=False, syn_num=1800, validation=False, workers=2)

webcsm commented 3 years ago

I guess my question is: what was your model selection criteria for the training? 1) You take the generator at convergence? 2) You take the generator with minimum loss? 3) or you consider the generator with the highest accuracy on the test set?

Hanzy1996 commented 3 years ago

Hi, @webcsm , for AWA1, I take about 150 epochs to achieve the best results.

For these hyper-parameters, I tune them based on the performance on the validation set. For more details about splitting the validation set, please refer to [1]. With the tuned hyper-parameter, I re-train the model from the scratch with the entire training set from seen classes and take the generator and the embedding function when the fake loss (_fake_ins_contrasloss and _cls_lossfake) converges.

[1] Yongqin Xian, Christoph H. Lampert, Bernt Schiele, and Zeynep Akata. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly. TPAMI, 2018.

webcsm commented 3 years ago

thank you @Hanzy1996 for your time and patience. Now I understood the whole flow so I can reproduce the experiments. I'm still getting a convergence for the fake losses around epoch 350, and not 150. What was the logic? Like a delta loss < threshold for a given number of epochs?

This gives me worse results than the ones reported (63% for ZSL for example). Could you confirm the parameters for AWA1?

attSize=85 batch_size=4096 beta1=0.5 class_embedding='att' classifier_lr=0.001 cls_temp=0.1 cls_weight=0.001 critic_iter=5 cuda=True dataroot='xlsa17' dataset='AWA1' embedSize=2048 gpus='0' gzsl=False image_embedding='res101' ins_temp=0.1 ins_weight=0.001 lambda1=10 lr=0.0001 lr_dec_rate=0.99 lr_decay_epoch=100 manualSeed=3483 matdataset=True nclass_all=50 nclass_seen=40 ndh=4096 nepoch=2000 ngh=4096 nhF=2048 nz=85 outzSize=512 preprocessing=True resSize=2048 standardization=False syn_num=1800 validation=False workers=2

webcsm commented 3 years ago

Hi @Hanzy1996 , I can only kind of reproduce the results if I consider the minimum loss of the generator as my stopping criterion.

zhihou7 commented 2 years ago

Could you provide the full params for all datasets? I find I can not achieve the reported results with those parameters.

I am sorry for the delay in releasing the parameters. Maybe I can firstly share with you some key parameters:

manualSeed：AWA1&2 (9182), FLO (806), SUN (4115) nz=attSize syn_num：AWA1(1800) AWA2 (2400), FLO (600), SUN (100) nhF: AWA1&2(2048), FLO (1024), SUN (1024) ins_weight: AWA1&2(0.001), FLO (0,01), SUN (0,01) cls_weight: AWA1&2(0.001), FLO (0.01), SUN (0,01)

Hanzy1996 commented 2 years ago

Hi, @zhihou7! I think these are all the required parameters, and other parameters are basically the same for each dataset. Could you please show the logs and all the parameters you used?

Miracle-Shen commented 2 years ago

Could you please provide the full best parameters of FLO dataset? I can't reproduce the optimal result on this data set. Thank you！

zhihou7 commented 2 years ago

For example, on SUN dataset, I train the network as follow,

python CE_GZSL.py --dataset SUN --nepoch 300 --class_embedding att --syn_num 100 --batch_size 768 --attSize 102 --nz 102 \

--embedSize 2048 --outzSize 512 --nhF 1024 --ins_weight 0.01 --cls_weight 0.01 --ins_temp 0.1 --cls_temp 0.1 \ --manualSeed 4115 --nclass_all 717 --nclass_seen 645

I can only achieve 40.0%, though I have used early stop strategy.

But on SUN dataset, I can achieve better performance than the reported.

Miracle-Shen commented 2 years ago

Thank you very much for your reply. We are students and are following up on this paper. Have you tried the APY dataset on this framework? If so, is it convenient to provide the hyperparameters?

At 2022-03-15 16:51:02, "zhihou7" @.***> wrote:

For example, on SUN dataset, I train the network as follow,

python CE_GZSL.py --dataset SUN --nepoch 300 --class_embedding att --syn_num 100 --batch_size 768 --attSize 102 --nz 102 \

--embedSize 2048 --outzSize 512 --nhF 1024 --ins_weight 0.01 --cls_weight 0.01 --ins_temp 0.1 --cls_temp 0.1 --manualSeed 4115 --nclass_all 717 --nclass_seen 645

I can only achieve 40.0%, though I have used early stop strategy.

But on SUN dataset, I can achieve better performance than the reported.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

Hanzy1996 commented 2 years ago

hi, @zhihou7, the experiments on SUN may not need the early stop strategy. I have updated the epochs each dataset needs here.