LabelSmoothSoftmaxCEV1, LabelSmoothSoftmaxCEV2, LabelSmoothSoftmaxCEV3

quanweiliu commented 2 years ago

Sorry. Excuse me. Performance or efficiency of the V1， V2， v3 progressive relationship? Can you explain the advantages of each of the three in more detail? Which one is recommended for use in your application?

CoinCheung commented 2 years ago

Hi,

In generatel, V1 means to implement with a combination of pytorch native ops. This has better compatibility no matter if you use it in cuda/rocm/tpu platforms. Pytorch would take good care of the hardware differences.

On the other hand, V3 is writen with cuda and wrapped with pytorch inferface. In this way, you can only use it on cuda platform(support up to cuda10.2 until now). The benefits of writing cuda kernels directly is that, some logit would be more direct which would be more memory and speed efficient. The drawback is that you can only use it on cuda platform.

As for V2, it is a demo for V3. It uses the backward computing rule defined on myself rather than pytorch's native implementation. It has good compatibility just as pytorch itself, and I use it to verify my V3 implementation.

Generally speaking, if you are using cuda platform, it is better to use V3, and if you want to be compatible with more platforms, you need to use V1. And if you want to know how does the backward computation work, you can see the code of V2.

quanweiliu commented 2 years ago

It’s appreciate that your detailed respond. I have understand.

CoinCheung commented 2 years ago

Close this because the question is answered.

CoinCheung / pytorch-loss

LabelSmoothSoftmaxCEV1, LabelSmoothSoftmaxCEV2, LabelSmoothSoftmaxCEV3 #30