My MXNet implementation of FBNet has some weird problems, so i reimplement with PyTorch. Support multi gpus training