A basic implementation of NLL_LOSS has been pushed.
Based on the performance testing results summarized earlier, we believe that using the gather operation would lead to a more efficient implementation (by observing the output results of latency, it seems this is also how torch does it), and we will push forward with this optimization.
A basic implementation of
NLL_LOSS
has been pushed.Based on the performance testing results summarized earlier, we believe that using the
gather
operation would lead to a more efficient implementation (by observing the output results of latency, it seems this is also how torch does it), and we will push forward with this optimization.