Results Can not be achieved!

SunTMan commented 7 months ago

I have the same question. The code provided can not achieve results in the paper. For example, as described in the paper, Simplenet can be 8 times faster than PatchCore, but the code provided can not achieve this. I have a question why this paper can be accepted!

DonaldRR commented 6 months ago

Can you show your results? The model run almost as fast as the backbone, computation from other components are negligible. It is faster than PatchCore since it uses other components for embedding fetching (on cpu) for each patch, it costs.

susuky commented 5 months ago

For the speed test result:

I also used an RTX 3080Ti to test the speed.

"I am getting 6.62 FPS instead of 77 FPS."

Here is my code:

import torch

import warnings
warnings.filterwarnings('ignore')  # suppress UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.

import backbones
from simplenet import SimpleNet

# initialize paramters
backbone_name = 'wideresnet50'
layers_to_extract_from = ['layer2', 'layer3']
device = torch.device('cuda')
input_shape = (3, 288, 288)
pretrain_embed_dimension = 1536
target_embed_dimension = 1536
patchsize = 3

# create model
backbone = backbones.load(backbone_name)
model = SimpleNet(device)
model.load(backbone, layers_to_extract_from, device, input_shape, pretrain_embed_dimension, target_embed_dimension, patchsize)
super(SimpleNet, model).train(False);

# create dummy input
x = torch.randn(1, *input_shape, device=device)

# warmup model
for _ in range(10):
    y = model.predict(x)

%%timeit 
with torch.no_grad():
    scores, masks, _ = model.predict(x)
    torch.cuda.synchronize()

briliantnugraha commented 4 months ago

Hi @susuky @SunTMan

I use the code above, and get slower results (200-350ms, which is ~3 FPS). After doing some checking in the _predict code, I find that the purple-marked line takes the most time. Hence, I suppose that the authors (@DonaldRR ) don't use it when doing the benchmark, cmiiw @DonaldRR.

Here is how I check the speed + snapshot of the code (mostly original code + time checker), hope it helps.

PREDICT detail: pre:0.000s; embed:0.050s; proj:0.001s; disc:0.051s; unpatch:0.000s; convert:0.386s
0 time: 0.437s
PREDICT detail: pre:0.000s; embed:0.016s; proj:0.000s; disc:0.017s; unpatch:0.000s; convert:0.351s
1 time: 0.381s
PREDICT detail: pre:0.001s; embed:0.015s; proj:0.000s; disc:0.017s; unpatch:0.000s; convert:0.271s
2 time: 0.302s
PREDICT detail: pre:0.000s; embed:0.013s; proj:0.000s; disc:0.013s; unpatch:0.001s; convert:0.262s
3 time: 0.289s
PREDICT detail: pre:0.000s; embed:0.013s; proj:0.001s; disc:0.017s; unpatch:0.000s; convert:0.260s
4 time: 0.290s
PREDICT detail: pre:0.000s; embed:0.013s; proj:0.001s; disc:0.014s; unpatch:0.001s; convert:0.255s
5 time: 0.286s
PREDICT detail: pre:0.000s; embed:0.019s; proj:0.000s; disc:0.020s; unpatch:0.000s; convert:0.264s
6 time: 0.296s
PREDICT detail: pre:0.001s; embed:0.013s; proj:0.000s; disc:0.015s; unpatch:0.000s; convert:0.333s
7 time: 0.360s
PREDICT detail: pre:0.000s; embed:0.012s; proj:0.001s; disc:0.013s; unpatch:0.000s; convert:0.248s
8 time: 0.274s
PREDICT detail: pre:0.000s; embed:0.014s; proj:0.000s; disc:0.014s; unpatch:0.001s; convert:0.197s
9 time: 0.224s

EDIT: I've also checked convert_to_segmentation part, and to retrieve the mask list (second output), it only takes 1ms by putting "return output" after this line of code. So, maybe, the interpolated features part is skipped by authors.

Meteor-Star commented 4 months ago

Perhaps this is how it calculates FPS: looking at the figure below (running on a 4060), I recorded the time at these two points. For 8 test images, the anomaly score for every set of 8 images can be obtained within 0.11 seconds (approximately 72 FPS), excluding the segmentation score. Is this method of calculation truly rigorous?

DonaldRR / SimpleNet

Results Can not be achieved! #58