Closed Bo396543018 closed 3 months ago
Hi @Bo396543018, sorry for the late reply, please refer to file multitask_schedule.py of the repo and Table 17 of the paper's appendix for how we calculate the sample_weight
@orashi Thank you for reply. I saw the calculation method for the weight of tasks mentioned above in the paper. But I didn't know how to get the sample weights for each category determined in the loss configuration? For example,
loss_cfg:
type: FocalDiceLoss_bce_cls_emb_sample_weight
kwargs:
cfg:
deep_supervision: True
no_object_weight: 0.1
class_weight: 0.25
dice_weight: 5.0
mask_weight: 5.0
redundant_queries: 1
num_points: 12544
dec_layers: 9
oversample_ratio: 3.0
importance_sample_ratio: 0.75
sample_weight: [1.0, 0.97325, 0.96685, 0.9903500000000001, 0.97325, 0.96685, 0.9903500000000001, 0.9929, 0.9459,
0.89645, 0.9929, 0.9459, 0.89645, 0.981, 0.9997, 0.99265, 0.9997, 0.99265,
0.9995, 0.9999, 0.9999, 0.9758, 0.9256500000000001, 0.9758, 0.9256500000000001]
By the way, I would like to ask if the initial weight of each task is determined based on their respective loss scales, such as 2000 for key points and 5 for segmentation.
@Bo396543018 the sample_weight (noted as gamma_n in the paper) here is the positive example ratio for each attribute, and are calculated based on the dataset, the usage and related references of it are in the description of Eq. 10 of the paper.
Regarding your second question about the choice of task weight (i.e. w_t in the paper), it's a set of empirical hyperparameters that we got through a simple greedy search, as described in appendix E (section Loss Weight w_D), For your convenience, here is a relevant quote:
The loss weight is normalized so that it only controls the relative weight for each dataset. Samples belonging to the same task type are treated with equal importance. Since different task types have different loss functions, image input resolution, number of samples, and convergence pattern, their loss weight should be set differently. For a reasonable loss weight trade-off between tasks, we gradually add task types one at a time in a small 10k iteration joint training setup and sweep sample weights for the newly added task type.
@orashi Thank you for your detailed explanation, I will go and look at the paper and code in conjunction with your tips~.
Glad if it would be helpful! Just adding a side note that although the usage of gamma_n is only shown in the Pedestrian attribute recognition (PAR) subsection, for consistency in task formation, the same loss L_par is also used as part of the loss formation of human parsing & pose estimation, so the 'sample_weight' would be calculated and present for those tasks as well as shown in our configuration file
For a new task, how to calculate sample_weight in loss cfg?