Question about sample_weight in loss cfg for each task?

OpenGVLab / UniHCP

Official PyTorch implementation of UniHCP

MIT License

149 stars 8 forks source link

Question about sample_weight in loss cfg for each task? #16

Closed Bo396543018 closed 3 months ago

Bo396543018 commented 4 months ago

For a new task, how to calculate sample_weight in loss cfg?

orashi commented 3 months ago

Hi @Bo396543018, sorry for the late reply, please refer to file multitask_schedule.py of the repo and Table 17 of the paper's appendix for how we calculate the sample_weight

Bo396543018 commented 3 months ago

@orashi Thank you for reply. I saw the calculation method for the weight of tasks mentioned above in the paper. But I didn't know how to get the sample weights for each category determined in the loss configuration? For example,

       loss_cfg:
          type: FocalDiceLoss_bce_cls_emb_sample_weight
          kwargs:
            cfg:
              deep_supervision: True
              no_object_weight: 0.1

              class_weight: 0.25
              dice_weight: 5.0
              mask_weight: 5.0
              redundant_queries: 1
              num_points: 12544

              dec_layers: 9

              oversample_ratio: 3.0
              importance_sample_ratio: 0.75
              sample_weight: [1.0, 0.97325, 0.96685, 0.9903500000000001, 0.97325, 0.96685, 0.9903500000000001, 0.9929, 0.9459,
                              0.89645, 0.9929, 0.9459, 0.89645, 0.981, 0.9997, 0.99265, 0.9997, 0.99265,
                              0.9995, 0.9999, 0.9999, 0.9758, 0.9256500000000001, 0.9758, 0.9256500000000001]

By the way, I would like to ask if the initial weight of each task is determined based on their respective loss scales, such as 2000 for key points and 5 for segmentation.

orashi commented 3 months ago

@Bo396543018 the sample_weight (noted as gamma_n in the paper) here is the positive example ratio for each attribute, and are calculated based on the dataset, the usage and related references of it are in the description of Eq. 10 of the paper.

Regarding your second question about the choice of task weight (i.e. w_t in the paper), it's a set of empirical hyperparameters that we got through a simple greedy search, as described in appendix E (section Loss Weight w_D), For your convenience, here is a relevant quote:

The loss weight is normalized so that it only controls the relative weight for each dataset. Samples belonging to the same task type are treated with equal importance. Since different task types have different loss functions, image input resolution, number of samples, and convergence pattern, their loss weight should be set differently. For a reasonable loss weight trade-off between tasks, we gradually add task types one at a time in a small 10k iteration joint training setup and sweep sample weights for the newly added task type.

Bo396543018 commented 3 months ago

@orashi Thank you for your detailed explanation, I will go and look at the paper and code in conjunction with your tips~.

orashi commented 3 months ago

Glad if it would be helpful! Just adding a side note that although the usage of gamma_n is only shown in the Pedestrian attribute recognition (PAR) subsection, for consistency in task formation, the same loss L_par is also used as part of the loss formation of human parsing & pose estimation, so the 'sample_weight' would be calculated and present for those tasks as well as shown in our configuration file