gathierry / FastFlow

Apache License 2.0
124 stars 22 forks source link

FastFlow performance #14

Open emathian opened 1 year ago

emathian commented 1 year ago

Hello I am really interested in anomaly detection models. Therefore I am following your work with the one of https://github.com/AlessioGalluccio/FastFlow and https://github.com/mjack3/FastFlow-1 ; since Fastflow appears to be one of the most promising frameworks. Actually, after training your implementation on my medical images, without much success, I have tried to reproduce your results. For example, on the carpet, I am still very far from your results with a ROC-AUC of "only" 81%. This score is reached after a few epochs. The loss function is extremely low. Do you observe a similar behaviour? Did you change some hyper-parameters? Thank you very much in advance :-) Emilie

cytotoxicity8 commented 1 year ago

Well, I could also observe that the performance fluctuates in most cases. However I couldn't 81% AUROC on the carpet. Can you share some performance graphs?

emathian commented 1 year ago

Hello @cytotoxicity8 Thank you for your very quick answer. Below you can find the evolution of the loss and the AUROC for the carpet, the wood and the hazelnut. As you can see while the loss curves seem to converge there are no improvements on the AUROC curves. LossCarpet

AUROCCarpet

LossWood

AUROCWood

LossHazelnut

AUROCHazelnut

Surprisingly enough, I didn't edit eval_once. [https://github.com/emathian/LNEN_FastFow_gathierry]. My main change is based on how I load the feature extractor, but I don't think the above behavior depends on it.

Do you have any guesses?

gathierry commented 1 year ago

Hi @emathian, are you using another DeiT pre-trained weight? And a modified DeiT module? I see some modifications here. https://github.com/emathian/LNEN_FastFow_gathierry/blob/main/fastflow.py I'm not sure what's the root cause right now. But maybe you can run the code without modification, and then modify your code incrementally. The results I list in README are all experiments based on the code in this repo. However, I only tested once so there could be some variance. But it shouldn't be as much as what you mentioned.

emathian commented 1 year ago

Dear @gathierry, Thanks for your comment :) . I can't use timm on my server because their proxy forbids to download things. I will check this problem with them. Nevertheless, I use deit.deit_base_distilled_patch16_384 and I used the weights published here https://github.com/facebookresearch/deit, so I think it is equivalent.
In parallel, I'm studying the effect of feature extractors on performance. I think ViT based feature extractors may not be the best because of the columnar architecture that does not capture local and global information. For my medical images, I train another hidden auto-encoder (ViT based ^^') to reconstruct the images, I used the encoded images as FastFlow inputs. This step is important given the significant differences between lung tumors and Imagenet. Furthermore I observed that FastFlow seems to be biased by the hospital, it manages to discriminate the center from where the patient was treated, which is not relevant information in my case. I wonder if the model fails to generalize with respect to large color variations. I simulated MvTec objects with many colors, keeping only one extra shade for the test set, in order to simulate the introduction of a new hospital.Curiously, the distribution of FastFlow scores is not biased by the type of color increase. Perhaps the effect I'm seeing is from other sources of variation, I don't know how well FastFlow is able to normalize over a highly variable training set.

Thank you very much ^^

gathierry commented 1 year ago

In that case it should be the same. I see you added DDP, are you using more than 1 GPU for training MVTec? If you only use 1 GPU then I think everything should be the same. Might need more debug if you really want to reproduce the result.

I'm not sure whether Fastflow is suitable for medical image but I think there's quite big difference between MVTec and medical images. For example, the shape of transistors are all similar but even a healthy lung of different people can look very different (feel free to correct me if this is not a good example). So IMHO it is possible that models working on MVTec may not achieve comparable performance on your data.

emathian commented 1 year ago

I am parallelizing on my dataset which is much larger than MVtec that is why I added DDP. Let's debug 😄! I will finish by finding the solution.

leibaoer commented 1 year ago

How to get the above performance graphs and where to modify the code?Thanks

cytotoxicity8 commented 1 year ago

How to get the above performance graphs and where to modify the code?Thanks

Try to use tensorboard, it is easy. (Insert some codes in main.py)