facebookresearch / pytorch_GAN_zoo

A mix of GAN implementations including progressive growing
BSD 3-Clause "New" or "Revised" License
1.62k stars 271 forks source link

why gdpp loss always (almost) negative, it can't compute BCE loss #82

Open Johnson-yue opened 5 years ago

Johnson-yue commented 5 years ago

Hi, I was trying your gdpp loss with pytorch version, but when I add gdpp loss on G_loss, the training process crash, because the G_loss(~1.0) + gdpp_loss is negative. So, it can't compute BCE loss .

My question is why is gdpp loss always negative, does it make sense??

Molugan commented 5 years ago

Hi,

Sorry for the delay I was on holidays.

If I'm correct you're using the DCGAN mode ? But the BCE isn't computed on G_loss + gdpp_loss, but on the score given by D.

To sum up you have: G_loss = BCE(D(G), True) + gdpp_loss (+others depending on your config) D_loss = BCE(D(G), False) + BCE(D(Image ground truth), True) (+others depending on your config)

So having negative values shouldn't make your model crash. Can you show me your configuration file ?

Johnson-yue commented 5 years ago

Yes,I'm using DCGAN for check it work, but I'm failed, I will check my configuration and reply to you

Johnson-yue commented 5 years ago

@Molugan I just following ReadMe step-to-step。

python datasets.py cifar10 $PATH_TO_CIFAR10 -o $OUTPUT_DATASET python train.py PGAN -c config_cifar10.json --restart -n cifar10 --GDPP true

and I also set _C.GDPP = True in models/trainer/standard_configurations/dcgan_config.py

But, when I enable --GDPP configuration, it has runtime error: No compute graph , please using retain_graph=True when you first backward .

I think the error occur in models/base_GAN.py because when G_loss backward the computer graph have been released, so when GDPP config is True, and GDPP backward the tensor phiGFake in the graph but gradient have been release!!

Molugan commented 5 years ago

In https://github.com/facebookresearch/pytorch_GAN_zoo/blob/1a0c76496792d4aa0904eb0afc80ea42c5f73b08/models/base_GAN.py#L257

Can you add the option retain_graph=True ?

Johnson-yue commented 5 years ago

No, the source code have not

retain_graph=True

I just modified GDPP configuration set True for testing GDPP work or not

Johnson-yue commented 5 years ago

@Molugan how to use gdpp-loss??? if I want to use gdpp-loss in DCGAN : step 1: compute G_loss step 2 : G_loss.backward(retain_graph=True) step 3: compute gdpp-loss step 4: gdpp_loss.backward()

Is right?

Johnson-yue commented 5 years ago

Why did you update the two of G_loss and gdpp_loss separately??

Iin paper and original tensorflow version code, the G_loss = G_loss + gdpp_loss and only backward onece G_loss

Molugan commented 5 years ago

Doing the backward in two changes does not affect the loss, though it can affect the execution time. In this case it allows a more modular architecture.

Please do not post the same message in different issues.

Johnson-yue commented 5 years ago

@Molugan oh,sorry, but when I try to gdpp loss train there are many problems, did train with gdpp loss sucessfully and improve performance?

Molugan commented 5 years ago

Sorry for the delay, busy weeks with many deadlines.

I had several successful training with gdpp: it should improve the SWD score.

Johnson-yue commented 5 years ago

can you show some log for gdpp loss in training , I can not use it train any model. Error in epoch 19:

RuntimeError: reduce failed to synchronize: cudaErrorAssert: device-side assert triggered

I only train DCGAN with mnist , but it failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [81,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [82,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [83,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [84,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [85,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [86,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [87,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [88,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [89,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [90,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [91,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [92,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [93,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [94,0,0] Assertion `input >= 0. && input <= 1.` failed.
/pytorch/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [95,0,0] Assertion `input >= 0. && input <= 1.` failed.
Traceback (most recent call last):
  File "main.py", line 166, in <module>
    main()
  File "main.py", line 158, in main
    gan.train()
  File "/media/yue/Backup_Data/home_DeepLearning/Zi2Zi/pytorch-generative-model-collections/GAN_GDPP.py", line 198, in train
    D_fake_loss = self.BCE_loss(D_fake, self.y_fake_)
  File "/home/yue/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yue/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 498, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "/home/yue/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/functional.py", line 2065, in binary_cross_entropy
    input, target, weight, reduction_enum)
RuntimeError: reduce failed to synchronize: cudaErrorAssert: device-side assert triggered

Then ,I code is :

Screenshot from 2019-10-28 15-04-19

Screenshot from 2019-10-28 15-04-36

Screenshot from 2019-10-28 15-04-58