Discrepancy in Stirling Approximation Handling in Poisson NLL Loss Compared to TensorFlow and PyTorch

akshatvishu commented 1 year ago

bug描述 Describe the Bug

Hello,

While using the Poisson NLL loss function in PaddlePaddle, I noticed a difference in the way the Stirling approximation is added to the loss, as compared to both TensorFlow and PyTorch.

Current PaddlePaddle Behavior:

For values of label that are less than or equal to 1, the Stirling approximation is being added to the loss. CODE

loss_out = loss_out + paddle.where(
            stirling_approx <= 1,
            paddle.zeros_like(stirling_approx),
            stirling_approx,
        )

Expected Behavior (Based on TensorFlow and PyTorch):

The Stirling approximation should be added to the loss only for label/target values greater than 1. It should not be added for values equal to 1.

Here's how TensorFlow and PyTorch handle it:

TensorFlow: Uses a condition to check targets > 1 before adding the Stirling approximation. CODE
PyTorch: Uses .masked_fill(target <= 1, 0) to ensure the Stirling term is only added for targets greater than 1. CODE

Here is a working example:

import tensorflow as tf
import torch
import paddle

# paddle:
input = paddle.to_tensor([2.], dtype=paddle.float32)
label = paddle.to_tensor([2.], dtype=paddle.float32)
epsilon = 1e-8
loss_out = input - label * paddle.log(input + epsilon)
print(f"loss_out:{loss_out}")
stirling_approx = (
            label * paddle.log(label)
            - label
            + 0.5 * paddle.log(2 * math.pi * label)
        )
print(f"string_approx:{stirling_approx}")
loss_out = loss_out + paddle.where(
            stirling_approx <= 1,
            paddle.zeros_like(stirling_approx),
            stirling_approx,
        )

print(f"loss after stirling:{loss_out}")

print("_______________TORCH_______________")
input = torch.tensor([2.], dtype=torch.float32)
target = torch.tensor([2.], dtype=torch.float32)
eps = epsilon
loss = input - target * torch.log(input + eps)
print(f"loss_out torch:{loss}")
stirling_term = (
            target * torch.log(target) - target + 0.5 * torch.log(2 * torch.pi * target)
        )
print(f" stirling calculation torch:{stirling_term}")

loss = loss + stirling_term.masked_fill(target <= 1, 0)
print(f"loss after striling torch:{loss}")

print("__________________TF___________")

# I have portend TF code to use TensorFlow API to showcase the internal working

def log_poisson_loss(log_input, targets, compute_full_loss=False, name=None):
    with tf.name_scope(name or "log_poisson_loss"):
        log_input = tf.convert_to_tensor(log_input, name="log_input")
        targets = tf.convert_to_tensor(targets, name="targets")

        # Ensure log_input and targets have the same shape
        if log_input.shape != targets.shape:
            raise ValueError(
                "`log_input` and `targets` must have the same shape, received "
                f"({log_input.shape} vs {targets.shape}).")
        # input - target * torch.log(input + eps)
        result = log_input - target * tf.math.log(log_input + epsilon)
        print(f"loss before stirling:{result}")
        if compute_full_loss:
            # Need to create constant tensors here so that their dtypes can be matched to that of the targets
            point_five = tf.constant(0.5, dtype=targets.dtype)
            two_pi = tf.constant(2 * math.pi, dtype=targets.dtype)

            stirling_approx = (targets * tf.math.log(targets)) - targets + (point_five * tf.math.log(two_pi * targets))
            print(f"stirling approx:{stirling_approx}")
            zeros = tf.zeros_like(targets, dtype=targets.dtype)
            ones = tf.ones_like(targets, dtype=targets.dtype)
            cond = tf.math.logical_and(tf.math.greater_equal(targets, zeros), tf.math.less_equal(targets, ones))
            result += tf.where(cond, zeros, stirling_approx)
            print(f"final result:{result}")
        return result

# Usage example
a = tf.constant([2.], dtype=tf.float32)
b = tf.constant([2.], dtype=tf.float32)
output = log_poisson_loss(a, b, compute_full_loss=True)
print(output)

"""
loss_out:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.61370564])
string_approx:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.65180647])
loss after stirling:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.61370564])
_______________TORCH_______________
loss_out torch:tensor([0.6137])
 stirling calculation torch:tensor([0.6518])
loss after striling torch:tensor([1.2655])
__________________TF___________
loss before stirling:[0.61370564]
stirling approx:[0.6518065]
final result:[1.2655121]
tf.Tensor([1.2655121], shape=(1,), dtype=float32)
"""

As you can see that at PaddlePaddle we check if the values in stirling_approx are less than or equal to 1, which is different from checking the values of label or target as in PyTorch and TensorFlow.

I wanted to know that if this was done knowingly or do we want to make it behave similar to TensorFlow and PyTorch ; I can happily raise PR in-case we went with the latter option.

其他补充信息 Additional Supplementary Information

A thing to be take an note of is that the working example I provide is when the input is not log_input .i.e. log_input=False and since Tensorflow does not support the non log_input values i modified the code ; this line to be precise: result = log_input - target * tf.math.log(log_input + epsilon)

Thus, if we want a more simpler example to reproduce this error :

import torch
import paddle.nn.functional as F

x = torch.tensor([2.], dtype=torch.float32)
y = torch.tensor([2.], dtype=torch.float32)
z = torch.nn.functional.poisson_nll_loss(x, y,full=True, log_input=False, reduction="none")
print(z)
print(z.dtype)
print("_______________PADDLE________________")
a = paddle.to_tensor([2.], dtype=paddle.float64)
b = paddle.to_tensor([2.], dtype=paddle.float64)
c = F.poisson_nll_loss(a,b,log_input=False, full=True,reduction="none")
print(c)
print(c.dtype)

"""
tensor([1.2655])
torch.float32
_______________PADDLE________________
Tensor(shape=[1], dtype=float64, place=Place(cpu), stop_gradient=True,
       [0.61370563])
paddle.float64
"""

And this was the PR which added Poisson NLL loss function

ZzSean commented 1 year ago

I'm sure this is a bug of implementation, we will fix this as soon as possible. Thanks for the issue!

akshatvishu commented 1 year ago

Hey @zhwesky2010 @luotao1 Can I raise a PR to correct this bug?

akshatvishu commented 1 year ago

Anyways, I have tried to setup developer environment for paddle( for linux ) but the instructions were not very clear on how to use docker container or run a test etc.. hence, I am posting the correction here itself:

# this is make the loss cond. on `label/taget` like torch and tensorflow
# the same will needed to be modified at test location for poisson nll loss 
loss_out = loss_out + paddle.where(
            label > 1,
            stirling_approx,
            paddle.zeros_like(stirling_approx),
        )

luotao1 commented 1 year ago

@LyndonKong Could you help see this issue? @ZzSean already reproduces the problem.

LyndonKong commented 1 year ago

Thanks for your issue @akshatvishu. It is a bug of implementation and I believe your fix should work. It would be nice if you could open a PR to correct this bug : )

akshatvishu commented 1 year ago

Thanks for your issue @akshatvishu. It is a bug of implementation and I believe your fix should work. It would be nice if you could open a PR to correct this bug : )

Can you please clarify on how to build and test paddle by compiling it locally via using docker as shown here specifically for linux Debian using docker.

If I already cloned my fork of main repo (as shown under contributing guideline ) , do I need to clone the main repo again as mentioned in the first of building from source for linux via docker? Specifically , I skipped these two steps as I already clone the fork at my local machine:

First select the path where you want to store PaddlePaddle, then use the following command to clone PaddlePaddle’s source code from github to a folder named Paddle in the local current directory:

git clone https://github.com/PaddlePaddle/Paddle.git

2. Go to the Paddle directory:

cd Paddle

If not then when I run docker container via executing the command -> docker exec -it paddle-test /bin/bash then it shows me this, I am confused on whether my setup is correctly installed or not. :

(base) vishu@aja:~/Paddle$ docker exec -it paddle-test /bin/bash
λ aja /home python3.9
Python 3.9.7 (default, Sep 10 2021, 00:03:59) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> paddle.__version__
'0.0.0'
>>> exit()
λ aja /home ls
ccache-3.7.9/  ccache-3.7.9.tar.gz  cmake-3.16.0-Linux-x86_64/  patchelf_0.10-2_amd64.deb

LyndonKong commented 1 year ago

The guideline actually requires you to mount your local developing directory to /paddle path in the docker, so I don't think you need to clone the main repo again.You can also find the details of how to build and test this operator from this document. Hope it can be helpful.

luotao1 commented 1 year ago

While using the Poisson NLL loss function in PaddlePaddle, I noticed a difference in the way the Stirling approximation is added to the loss, as compared to both TensorFlow and PyTorch.

@akshatvishu How do you notice the difference, do you have a tool like https://github.com/PaddlePaddle/PaConvert ?

akshatvishu commented 1 year ago

@akshatvishu How do you notice the difference, do you have a tool like https://github.com/PaddlePaddle/PaConvert ?

I was just testing the various frameworks for the Poisson nll Loss and for some reason while paddle and torch shared the same parameters , they were giving different results when I was passing Full=True

PaddlePaddle / Paddle