Closed yelusaleng closed 3 years ago
Hi, thank you for pointing out this issue.
Let me check what's going on. I didn't observed that in practice but maybe because I didn't wait enough time to see it.
In the meantime, maybe you could play with the learning rate. It could be a reason why the loss stays constant.
@RonyAbecidan , tks for your response. I have tried various learning rates to train, but the above issue still occurs. Is it possible that this code works fine for forward propagation, but has problems with backward propagation during training?
The authors of the paper succeeded in training their algorithm. Maybe I didn't completely understand how because it is not explicitly shared by them. I am going to see what I can do ;)
many tks. I will also try to repair the issue.
0.693 is the value of the loss for a random classifier so this bug is really bad. We need to understand what is going on.
yep
Just to be sure, you observed that with your own dataset and not with my random datasets in the notebook right ?
yes, i'm sure.
By curiosity, I tested the training of the model with a constant forgery mask (a big white square in the left corner) and it works. So it seems that there is no something that forces the model to act as random for any dataset. Do you work with confidential datasets or can you share one of them to see what's going on with them ?
its my mistake, i have solved the issue by changing the criterion nn.BCEWithLogitsLoss()
to nn.BCELoss()
.
by the way, the pytorch_lightning script have some bugs for multiplt-gpus training. pytorch is better than pytorch_lightning.
thanks again!
Ok, I used nn.BCELoss to let the sigmoid at the final layer so that we can immediately see the forgery mask at the outputs. I am not sure about the bugs you mention for pytorch-lightning but I am sure that they can be fixed. Pytorch-lightning is a wrapper of Pytorch so saying than one is better than the other is debatable. I like pytorch-lightning for its simplicity and code structure but everyone has its own preferences ;)
I close the issue now thank you for your responsiveness =)
i understand that. i think the optimization of pytorch_lightning is not enough. when i use the same batchsize for the two pytorch_lightning and pytorch, the pytorch_lightning show the OOM error, but pytorch not. But none of that matters anymore, and I still appreciate your response.
In addition, I made one change to your code for multi-GPU training. i move the code lines 528-532 to 544-548.
self.end = nn.Sequential(nn.Conv2d(8, 1, 7, 1, padding=3),
nn.Sigmoid())
def forward(self, x):
B, nb_channel, H, W = x.shape
# Normalization
x = x / 255. * 2 - 1
## Image Manipulation Trace Feature Extractor
self.bayar_mask = torch.tensor(np.ones(shape=(5, 5))).to(device=self.device)
self.bayar_mask[2, 2] = 0
self.bayar_final = torch.tensor(np.zeros((5, 5))).to(device=self.device)
self.bayar_final[2, 2] = -1
## **Bayar constraints**
self.BayarConv2D.weight.data *= self.bayar_mask
I don't know if it is optimal because you are creating the masks at every forward pass.
the reason i changed the code is it show the error that self.bayar_mask and self.BayarConv2D.weight.data are on different device 'cuda: 0' and 'cuda: 1' when I try multiple-gpu training.
Yes I understand. Maybe you could do a condition like 'if self.bayar_mask exist then else self.bayar_mask=...' to avoid re-building it everytime
OK. many thanks.
hi, i still have some questions about the training. i tried to train the Local Anomaly Detection Network (LADN) while freezing the weights of image manipulation trace feature extractor. However, the LADN can not be converged. Although I fail to see the problem with your code, I take the liberty to ask if your code can only be applied for testing and not for training?
Hello, I have not explicitly coded something that prevent the model from being trained. It could be used for training and you can see it with very simple cases (for instance, if you take a constant mask for every image, the training works). However I admit that knowing how to properly train it with a new dataset is difficult and the authors of the papers didn't really share this information. I will be happy to find how to train it correctly with the help of the community ;)
ok, we try together.
hi, thanks for your works, it's great for researchers.
I have some questions about the repo. The test script can be run executed. However, the training script has some errors. No matter what dataset I use, the loss value stays the same after it is reduced to 0.693. So I'm asking if you run this training script successfully?