Closed nguyenvulong closed 10 months ago
Hello @nguyenvulong,
Thank you for your interest in our work!
The real class when the data loader loads the images is {FAKE: 0, REAL: 1}
, because fake
is the folder it first loads. However, for our prediction we output FAKE: 1
and REAL: 0
data loading and training : {FAKE: 0, REAL: 1}
Prediction: {FAKE: 1, REAL: 0}
, in many deepfake challenges, fake
class is 1
.
In our current case (tensor([[0.0468, 0.9539]], device='cuda:0')
), max_prediction_value
returns 1
, and abs(1 - 0.9539) = 0.0461
. the class is real
.
Now, to be consistent and reflect with how the data loader classifies the images, I used {0: "REAL", 1: "FAKE"}
, which then I can easily flip using XOR
.
if the prediction is 0 => 1
, if prediction is 1 => 0
we can write an if else
statement, but i found XOR
a nicer way to do it.
I hope this answers your question.
Thank you it's clear. However I need to discuss about the max_prediction_value
in the case of custom videos (not from the datasets but collected in the wild).
python prediction.py \
--p ./test_videos \
--f 1 \
--d yours \
--n ed
def max_prediction_value(y_pred):
# Finds the index and value of the maximum prediction value.
mean_val = torch.mean(y_pred, dim=0)
print(f"mean_val: {mean_val}")
print(f"y_pred: {y_pred}")
print(f"y_pred dim: {y_pred.dim()}")
return (
torch.argmax(mean_val).item(),
mean_val[0].item()
if mean_val[0] > mean_val[1]
else abs(1 - mean_val[1]).item(),
)
In this case, the input
to max_prediction_value
which is y_pred
(or torch.sigmoid(model(df).squeeze())
from pred_vid
) has only one dimension (Please refer to the values I printed).
Therefore, mean_val[0]
and mean_val[1]
will cause the following error.
mean_val: 0.49663394689559937
y_pred: tensor([0.3401, 0.6532], device='cuda:0')
y_pred dim: 1
An error occurred: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number
My guess is in your experience, some how y_pred
is a batch with two dimensions after torch.sigmoid(model(df).squeeze())
from pred_vid
. Maybe that's the case with the datasets [ dfdc, faceforensics, timit, celeb ]
but not the custom dataset?
Update: I know why it happened. Because the number of frames = 1
.
This is the updated code to handle both cases when number of frames = 1 and >1.
I will create a PR if you don't mind.
def max_prediction_value(y_pred):
if y_pred.dim() == 1 and y_pred.size(0) == 2:
# When y_pred is a 1D tensor with two elements, no need to take the mean
pred_label = torch.argmax(y_pred).item()
pred_val = y_pred[pred_label].item()
return (pred_label, pred_val)
else:
# Compute the mean value across the batch dimension (dim=0)
mean_val = torch.mean(y_pred, dim=0)
# Still, check if mean_val is not a 0-dimensional tensor, just to be safe
if mean_val.dim() == 0:
mean_val_val = mean_val.item()
return (0, mean_val_val) if mean_val_val > 0.5 else (1, 1 - mean_val_val)
# Assume mean_val is a tensor with more than one dimension
pred_label = torch.argmax(mean_val).item()
pred_val = mean_val[pred_label].item()
return (pred_label, pred_val)
Hello @nguyenvulong,
Thank you for finding the issue with number of frames being 1 and for your PR. I've checked the code and it indeed fails when the number of frame is 1.
I've seen your code, and thank you for your effort. However, I've a simple work around: check if the mean_val.numel()
is 1
. if mean_val.numel()
is 1
, then use y_pred
directly, since its detecting a single prediction rather than a batch. if mean_val.numel()
is greater than 1
, continue.
Then the rest of the code logic remains the same.
def max_prediction_value(y_pred):
# Finds the index and value of the maximum prediction value.
mean_val = torch.mean(y_pred, dim=0)
# Check if mean_val element is 1
if mean_val.numel() == 1:
mean_val = y_pred
return (
torch.argmax(mean_val).item(),
mean_val[0].item()
if mean_val[0] > mean_val[1]
else abs(1 - mean_val[1]).item(),
)
I've tested the updated code with 1 frame using the sample videos. The image shows the prediction using 1 frame.
Thank you again!
Thanks so much. I will close the PR and this question for now!
Hello, Thank you for the great end-to-end approach.
the first value of
torch.sigmoid(model(df).squeeze())
is fake probability, right? For example, after thesigmoid
function we havetensor([[0.0468, 0.9539]], device='cuda:0')
, then 0.0468 is the chance of the sample being fake, correct?XOR
? Since the prediction is0
, then the label then become1
, which isFAKE
. Could you please explain.