JunrQ / NAS

Neural architecture search(NAS)
14 stars 2 forks source link

got nan when calculate gumbel_softmax #21

Closed ChenMinQi closed 5 years ago

ChenMinQi commented 5 years ago

Sometimes nn.functional.gumbel_softmax will return nan if using GPU to calculate. It will not happen if using CPU to calculate.

test code:

import torch
import torch.nn as nn
import math

if __name__ == "__main__":
    batch_size = 128
    temperature = 5.0
    theta = torch.FloatTensor([1.753356814384460449,1.898535370826721191,0.6992630958557128906,
                                0.2227068245410919189,0.6384450793266296387,1.431323885917663574,
                                -0.05012089386582374573, -0.06672633439302444458])
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    t_gpu = theta.repeat(batch_size, 1).to(device)
    max_num = 1000000
    nan_num = 0
    for i in range(max_num):
        weight = nn.functional.gumbel_softmax(t_gpu, temperature)
        if math.isnan(torch.sum(weight)):
            nan_num+=1
    print("GPU: nan {:.3f}% probability happen, tot {}".format(100.0 * nan_num / max_num, nan_num))
    nan_num = 0
    t_cpu = theta.repeat(batch_size, 1)
    for i in range(max_num):
        weight = nn.functional.gumbel_softmax(t_cpu, temperature)
        if math.isnan(torch.sum(weight)):
            nan_num+=1
    print("CPU: nan {:.3f}% probability happen, tot {}".format(100.0 * nan_num / max_num, nan_num))

got results:

GPU: nan 0.004% probability happen, tot 38 CPU: nan 0.000% probability happen, tot 0

I'm not sure if it is a bug of pytorch or a bug of gumbel_softmax or there are some restrictions for the value of theta.

ChenMinQi commented 5 years ago

it seems like it's bug of pytorch.