about gradient - Githubissues

JunMa11 / SegLossOdyssey

A collection of loss functions for medical image segmentation

Apache License 2.0

3.81k stars 602 forks source link

about gradient #36

Closed rookiez7 closed 2 years ago

rookiez7 commented 3 years ago

I'm sorry to bother you.In SegLoss/losses_pytorch/hausdorff.py,I find function distance_field use torch.no_grad,this wil make this fuction have no gradient, Will this have an impact on training?I don't know much about the impact.Can you explain it for me? Thanks.

JunMa11 commented 3 years ago

Hi @rookiez7 ,

The direct answer is No. This is because that the distance transform map can be regarded as a constant in the HD loss. Hope the answer helps.

rookiez7 commented 3 years ago

Thanks for your answer, I still have some confused.Why the distance transform map can be as a constant？I've read the paper, but I still don't have a clue. Although the variable pred_dt comes from pred which is the output of model, the subsequent operation (pre->pred_dt->loss)of pred does not depend on the network, only on pred that has been determined, so it can be considered a constant.?Or for other reasons? Another question,hausdorff ,row31,use fg_mask = img[batch] > 0.5,but in row 56# pred = torch.sigmoid(pred),why dont use sigmoid?

JunMa11 commented 3 years ago

Hi @rookiez7 ,

Sorry for my late reply. I move to a new place so things are a little bit messy at the beginning.

Basic, the distance transform map is used to approximate the HD distance and we need both the gt_dist and seg_dist. From this motivation, the dist computing process should not be included in the BP. You can check the original paper to obtain more insights.

I do not have experience with this version HD loss. This project may give you some insights about an alternative HD loss implementation. https://github.com/JunMa11/SegWithDistMap

Best regards, Jun

rookiez7 commented 3 years ago

Hi Jun, Sorry to bother you again.

I have read Karimi's original paper and your code in SegWithDistMap.Is it because he used a (p-q)^2 instead of |p-q| in loss function?p denote predicted,q denote groundtrues,in BP gradient can be passed from the predicted ?.Can you explain it to me in a detail about the dist computing process should not be included in the BP when you are not busy?

Best Regards.

JunMa11 commented 3 years ago

Hi @rookiez7

Sorry for my late reply. I retrieve my email record with the HD loss author in 2019, which may be useful for you to understand it.

"p" is usually already binary but "q" should be thresholded. With regard to differentiation, as you may know the distance transform is highly non-differentiable. We treat it as constant between iterations.