carla-simulator / reinforcement-learning

Reinforcement learning baseline agent trained with the Actor-critic (A3C) algorithm.
Other
238 stars 60 forks source link

Probabily vector with negative values #17

Closed d34dl0ckk closed 4 years ago

d34dl0ckk commented 4 years ago

Hi,

I got error while running the code: ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

The error occurs at line 27 in policy_output.py

When I tried to print the values contained in batch_probs, I got this:

[ 2.29848295e-07 4.51255920e-01 1.97014214e-05 3.09768734e-05 5.48567290e-01 -1.18768529e-07 1.16773965e-04 -1.15810205e-07 8.16957236e-06]

There are some negative values there... Can you help me please ?

I found this interesting article : (https://numpy.org/doc/1.17/reference/random/generated/numpy.random.mtrand.RandomState.multinomial.html)

Thank you.

d34dl0ckk commented 4 years ago

I think it is because of the tiny value substracted to the probability vector but when taking it off I got this error:

"ValueError: sum(pvals[:-1]) > 1.0" in numpy.multinomial

d34dl0ckk commented 4 years ago

@felipecode I added this line just before calling the multinomial numpy function :

# "ValueError: pvals < 0, pvals > 1 or pvals contains NaNs" in numpy.multinomial batch_probs = np.absolute(batch_probs) and it worked just fine !

wangyixu14 commented 4 years ago

Hi AminaBa,

I have the same problem 'ValueError: pvals < 0, pvals > 1 or pvals contains NaNs' here. I tried your method and then it worked for a while. But then raise the "ValueError: sum(pvals[:-1]) > 1.0". May I know your solution of this sum error?

Thanks, Yixuan

d34dl0ckk commented 4 years ago

Hi @wangyixu14

Here is what I did to solve both issues:

    # Prevent having a vector which sum is not in [0, 1]
    while not 0 < np.sum(batch_probs) < 1: 
        # Subtract a tiny value from probabilities in order to avoid
        # "ValueError: sum(pvals[:-1]) > 1.0" in numpy.multinomial
        batch_probs = batch_probs - np.finfo(np.float32).epsneg
        # Apply abs function to keep probability values positive to avoid
        # "ValueError: pvals < 0, pvals > 1 or pvals contains NaNs" in numpy.multinomial
        batch_probs = np.absolute(batch_probs)

Hope it'll help! Best.