how to infer - Githubissues

Leminhbinh0209 / CVPR24-FAS

Official implementation of CVPR24 paper "Gradient Alignment for Cross-Domain Face Anti-Spoofing"

58 stars 3 forks source link

how to infer #2

Closed blacksino closed 7 months ago

blacksino commented 7 months ago

I have tried run this model on my custom dataset and results seems to be not good or did I miss some preprocessing part?

Leminhbinh0209 commented 7 months ago

Hi,

I suppose you're inferring the model. Make sure that you load the pretrained weights with strict=True, and apply transformation to the cropped faces (see SAFAS for cropping) :

test_transform = transforms.Compose([
            transforms.Resize((256,256)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                         std=[0.229, 0.224, 0.225])
        ])
img = Image.open(face_path).convert('RGB')
input = test_transform(img)
prob= model(input)

probability of live class is nn.Sigmoid()(prob) (BCE loss)

blacksino commented 7 months ago

yes, I did exactly what you have mentiond above.however,the result is relatively strange

model = get_model(config)
model = model.cuda()
ckpt = torch.load("/mnt/a/joe/code/CVPR24-FAS/CVPR24-GACFAS/weights/resnet18_pICM2O_best.pth")
model.load_state_dict(ckpt["state_dict"])
model.eval()
test_transform = transforms.Compose([
            transforms.Resize((256,256)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                         std=[0.229, 0.224, 0.225])
        ])
img = Image.open("/mnt/a/joe/code/STAR/images/video/img_1.png").convert('RGB')
with torch.no_grad():
  input = test_transform(img)
  prob= model(input.unsqueeze(0).cuda())
print(torch.sigmoid(prob))

live prob for following 2d screen attack is :0.4750

however when it comes to real,liveness prob is 0.4721

Leminhbinh0209 commented 7 months ago

Thank you for your feedback. I think you missed one step: face cropping, it means you should crop the face instead of inputing the full frame. Moreover, it is hard to validate its performance with just a few samples. And, both metrics, AUC and HTER, do not use 0.5 as a threshold to measure. Thus, to validate DG-FAS models, including ours, my opinion is that you should make collections of live and spoof images. Then, make predictions on a decent number of samples, and plot the probability distribution of both classes.

blacksino commented 7 months ago

Thank you for your patience; I'll give it a try. By the way, I'm a newcomer to liveness detection. In my opinion, face cropping might omit some critical information for Facial Anti-Spoofing (FAS) because additional elements visible in the image can assist in determining whether it is spoofed. For instance, bezels could sometimes provide clues.

Leminhbinh0209 commented 7 months ago

You are right. That could be a good approach if you can utilize it, but it can make a model highly reliant on those background artifacts while training and overfit when evaluated on other different datasets. In my experience, the most recent models chose to crop the face, but I don't say that your approach is impossible and it just needs more exploration.

blacksino commented 7 months ago

Thanks!