Locally Aware Feature Comparison

NilsB98 commented 1 year ago

To improve the anomaly map use not only a pixel-wise difference between the reconstruction and the original image, but include a feature based comparison with features extract by a second network. See this paper.

Henry0528 commented 11 months ago

hello，are you still working on this project?

NilsB98 commented 11 months ago

Hi Henry, absolutely! Just did the work locally recently. If there is some specific functionalitiy that you'd wish for feel free to make proposals, I'm always glad to get some interaction and feedback :)

Henry0528 commented 11 months ago

Thank you very much for your reply! I wonder whether you have some final results about the model performance of each class like image-level and pixel-level AUROC?

NilsB98 commented 11 months ago

I'll have to train the model for the different objects, currently I got the recent metric only for the hazelnut data. There I'm able to achieve a pixel-level f1 of 0.711 and a img-level f1 of 0.942.

I'll try to run the experiments on the other MVTEC objects over the weekend as well and can give you the scores too, but also refactoring my local code s.t. it's useful and runnable by others with this repo.

Henry0528 commented 11 months ago

Hello, I have been working on your code and trying to make some modifications, but I have met some problems. I changed the image input by the features extracted by a pre-trained EfficientNet-b4(2723232 in size) and used all normal images neglecting the classes to train a unified model that can detect all classes in the MVTec dataset. While I got extremely low-performance testing by the AUROC on both image-level and pixel-level. I have uploaded the code and I would be grateful if you could give any suggestions.

NilsB98 commented 11 months ago

Hi, sure I can check it out, is there a specific file that you already suspect to not work properly? Feel free to link it. Other than that I'm currently working on the feature extraction task as well and will push the code soon.

Henry0528 commented 11 months ago

Here is the code link, I have made some changes in main.py and used test.py to evaluate the model performance while getting a really bad result. Here is the paper I'm reproducing, I want to reproduce the baseline model performance in Tab.4(LafitE w/o F.E).

NilsB98 commented 11 months ago

Okay, thanks, the approach mentioned in the paper does sound interesting indeed. I'll probably check on your code tomorrow evening and see whether I can find out where things might go wrong.

Henry0528 commented 11 months ago

Hello, have you got any idea? I have tried many different settings but struggled to get a good baseline model.

NilsB98 commented 11 months ago

Hi Henry, again sorry for the late response. Somehow github doesn't really show the differences between the files you uploaded and the original code quite well and just marks the whole files as different. Did you already try to implement parts of the mentioned paper, or is it only the metrics that you added?

If you only changed the metrics, then I can provide you with some code which should work pretty much out of the box, and give good results. Otherwise can you point me to the implementations that you made besides the metrics, if there are any?

Henry0528 commented 11 months ago

I've made two modifications. 1. according to the mentioned paper, I used the feature extracted by an Efficientnet as input in main.py(line 156 and line 213-217) 2, in test.py, I added image and pixel AUCROC metrics in line 161. BTW could you add the image and pixel AUCROC metrics to the original code?

NilsB98 commented 11 months ago

Thanks for the clarification. So there are a few ideas I got from looking at the code. After reading the paper I'm not quite sure about the "Anomaly Estimation" section. In my implementation which worked with the images directly I created the diffmap by subtracting the original and the reconstructed image from each other, s.t. for each color channel there exists a diff-map and then to get to a one dimensional map I took the maximum of those at each position. I'm not quite sure whether this is what they are doing in the paper, since I think it would make sense to calculate the L2-Distance between the features such that we end up with a tensor with the dimensions (W,H,1) instead of (W,H,num_channels). (the order of the dimensions might be wrong.)

Here is the current computation. Also the normalization I added there doesn't really make sense so that part should be removed anyways.

Other than that the code looks quite okay on the first sight, so I'll have to do a little debugging if this still doesn't help :)

And sure I'll add the code for the AUROC!

Henry0528 commented 11 months ago

In my opinion, there are lots of methods to get the final 1-dimension anomaly map. The way you used is OK but I think the more common operation is to first calculate a diffmap between original images or features and the reconstructed ones with (H, W,C), then we get the final anomaly map by simply summing up or taking means of all channels like using torch.sum or torch.mean to get the (H,W,1) map. I think the different ways to calculate the anomaly map are not the key factors that affect the final results.

NilsB98 commented 11 months ago

Yea I also think that this shouldn't be the decisive factor. I'll clone your repo and to some testing and debugging to see whether I might find out where things might go wrong or where we might have missed something else.

NilsB98 commented 11 months ago

Hi Henry, I opened a new ticket for the LafitE topic here, since I now added the code for this ticket and will close it now.

NilsB98 / Diffusion-Based-AD

Locally Aware Feature Comparison #16