SDret / Pedestrian-Attribute-Recognition-as-Label-balanced-Multi-label-Learning

Official pytorch implementation of the ICML2024 main conference paper: Pedestrian Attribute Recognition as Label-balanced Multi-label Learning
9 stars 0 forks source link

Why reload the model weight at the training phase #3

Open rose-jinyang opened 1 month ago

rose-jinyang commented 1 month ago

Hello How are you? Thanks for contributing to this project.

I found a strange part in your code.

image

image

Why reload the model weight at the training phase?

SDret commented 1 month ago

Thanks for your question, in our method of GOAT, we apply a periodic re-starting mechanism in GOAT to keep SGD updates progressing within the region of conservative confidence intervals, i.e., not staying off the vicinity of the initial feature extractors. Therefore, after every 50 steps of updating, we re-starting the SGD updates at the converged feature extractor, say, the model reloaded from get_reload_weight function.

rose-jinyang commented 1 month ago

Thanks for your quick reply. I think that the "get_reload_weight" function is NOT completed fully.

image

SDret commented 1 month ago

Yes, as we stated in the README, you should fill in your own path of pre-trained feature extractor at the `YOUR PATH' above.

rose-jinyang commented 1 month ago

What do you mean by "pre-trained feature extractor"? Does it mean the pre-trained weight of backbone? Otherwise, does it mean the intermediate model trained with this project by me?

SDret commented 1 month ago

Specifically, it is the model trained after the Stage#1 of FRDL (not just the backbone, please read our paper and README for details), and we recommend you to train this model with the baseline work named in the README.

rose-jinyang commented 1 month ago

It seems that the entire training consists of two sub-trainings. The first training is to train a baseline model without using the FRDL, GOAT modules and any weighted BCE loss. The second training is to train a final model based on the model trained at the first training step. Correct? If so, could you guide me how to do the first training in detail?

SDret commented 1 month ago

Yes, your understanding of our method is correct. To train the baseline model, you should follow https://github.com/valencebond/Rethinking_of_PAR to produce the baseline model for the first-stage training: (1.remove the weighted BCE loss 2.add the classifier.separate layer within our base_block.py into its base_block.py 3. use the convnext.py model we provide into its training pipeline).

Next, save the trained baseline model, and fill in its path into the `YOUR PATH' at the get_reload_weight function. Finally, simply run the command in our README.txt, and all training and testing will be automatically conducted.

rose-jinyang commented 1 month ago

For the first training, could you explain more clearly? A general user can not know how to remove the weighted BCE loss and add the classifier.separate layer within our base_block.py into its base_block.py. For more convenience, here, you can make a sub-project for the first training so that a user can avoid to change code files directly.

SDret commented 1 month ago

Let me explain more about our work. One merit of our work is that it is catering to almost all off-the-shelf models, i.e., you can use ANY pre-trained feature extractor from ANY existing works, not necessarily be the baseline model we mentioned. Thus, it is not a necessary step to execute the steps I mentioned above as a default step of this work.

So, you could basically use ANY pre-trained feature extractor with our FRDL and GOAT, except that with the steps I highlight above, you could reproduce the benchmarks results in papers, however, which are not the optimal ones you can get.

Regarding `A general user can not know how to remove the weighted BCE loss and add the classifier.separate layer within our base_block.py into its base_block.py.', to remove BCE loss, you just set the label_mean in train.py as None, and for classifier.separate, simply contrasting the base_block.py in ours and that in the adopted baseline work, you will find what I mean.

Let me say it again, this work is a plug-and-play macro learning pipeline, based on NO specific pre-trained baseline model. So, basically, you could implement it directly onto ANY PAR model, that is why we do not give the first-stage training code, since it is actually not a part of this work.