Why using Binary cross entropy loss for target prediction?

studybox commented 2 years ago

I notice you use binary cross entropy loss for target prediction instead of cross entropy loss. Are you treating each target independently instead of from a single multi-class distribution?

Henry1iu commented 2 years ago

Hi,

I use BCE just because the gt label is binary (the closest candidate or not). Could you please describe the reason why the multi-class classification should be used and what are those classes?

Best, Jianbang

studybox commented 2 years ago

My understanding is that the classes are the discrete locations. In the TNT paper, it says " $\pi(\tau^n|x)= exp f(\tau^n,x)/\sum_{\tau'}exp f(\tau',x)$ is a discrete distribution over location choices $(x^n,y^n)$ ". But I think you are correct, they should have the same outcome.

Henry1iu commented 2 years ago

Hi,

In my opinion, "" means the output nonlinear layer is a softmax function. It has been demonstrated in my implementation (you can refer to target_prediction.py).

The target prediction outcome will be a discrete probability distribution over the target sampling region, which I think is just a fancy way to say "predict the possibility for each target candidate".

The sampling strategy of TNT will produce various numbers of candidates for each sequence. The maximum number of candidates will be a few thousand. Also, the position of each candidate will be different. If doing the multi-class, I'm not sure how we can define each class. The binary classification just sounds more reasonable in my view. If you have a better idea, please feel free to discuss it with me via Email.

Best, Jianbang

Henry1iu / TNT-Trajectory-Prediction

Why using Binary cross entropy loss for target prediction? #16