Open dtivger opened 7 years ago
@dtivger The missing part is computed in previous layers, softmax and linear_regression.
@Seanlinx I got it , Thanks
@Seanlinx @dtivger Please, why is the gradient calculated like "cls_grad /= len(np.where(cls_keep == 1)[0]) bbox_grad /= len(np.where(bbox_keep == 1)[0])" in backward?
@Seanlinx Hi , Seanlinx , I have some questions about your negativemining op . Theoretically , the loss of the CLS can be writen into 1(x) log(x) (-1/ohem_keep) , in which the x represents the tuple of cls_label and the softmax op's output ( x=(label , prob)) , the 1(x) represents indicator function ( 1 { . } ) , so the bottom diff is 1(x) (1/ x) (-1/ohem_keep) , but you only compute 1(x) (-1/ohem_keep) . Meanwhile , the loss of the BBOX can be writen into ( x )^2 / valid_num , so the diff is x 2 /valid_num , but you only compute 1 / valid_num . Can you show me your advice ?