Linear Regression - Githubissues

ineiti commented 1 year ago

Test if it works with linear regression

Ahmed tried to do a linear regression with meager results.

TODO:

[x] check with @Thity if this is the correct implementation
[x] make sure there are no other mistakes

martinjaggi commented 1 year ago

did you check a linear classifier as well? (just as a comparison point when we do have the labels, like in the MVTech dataset. this is to make sure we can compare to some of the reported standard numbers from existing papers)

martinjaggi commented 1 year ago

trying to formulate the current plan a bit more clearly, how to move from patch-core (a nearest neighbor approach, not easy to make private) to a more standard linear regression or classifier approach (easy to make private).

unsupervised = only regular patches, no anomalies.

use pre-trained embeddings (i.e. before last layer of the CNN) from patch-core or imagenet or MVTech) - easier to not train our own if not strictly needed
make sure it gives a good nearest neighbor result (this is the patch-core baseline). check if accuracy is as reported as in patch-core
generate pseudo-labels from that patch-core like model. this is they key step how to go from unsupervised KNN to a regression model trained like a supervised one:
- split each the dataset in half, and use teach the regression model to report the distance to the regular ones (as in patch core). so you create a pseudo-label for each of those 50% holdout patches.
train the a linear regression model to predict these distances
at inference time (when you use the model), compute the regression value, and use the same thresholds as in patch-core to decide if that patch should be anomalous or not

in a later step the above can be done the same way in a federated setting, where everyone would use their local pseudo-labels to participate in the federated training of one global linear regression model. very easy to do this one in full privacy preserving mode, which would be nice.

but first we'd need to know if accuracy can be ok (comparable to patchcore ideally).

lanterno commented 1 year ago

Update: Thanks to Martin, following his notes above,

I used the patchCore code from anomalib to get both the embeddings and the patch scores as well.

What I still need to do before I start looking into training the LR model is find the threshold that patchcore uses to define an anomalous patch. This is tricky to find in the code. So far, I've only found a threshold for the whole image, and a threshold per pixel. Hopefully, after a bit of digging, I can find the patch threshold and get to training the LR model.

update: On the last point, I'm working with Thierry now.

lanterno commented 1 year ago

After training PatchCore, I was looking into the implemented code to find the threshold for each patch.

Yesterday, after looking again into the paper, and the code, I found out that they don't actually calculate a threshold for the patch.

An earlier reading of the paper, I saw this part:

Screenshot 2023-06-08 at 09 01 39

but digging deeper, now I saw that in fact, the paper gives later a more in depth mechanism for calculating the image score in more complicated way:

Screenshot 2023-06-08 at 09 02 33

With these two new pieces of information, I'm changing a bit my direction

First, I want to see if my linear regression model can actually generate good estimations. I will do this by plugging in the LR model into the existing patchCore code, and assess the quality of the predictions.

Second, If the LR is proven to be good, we can move to getting the threshold by ourselves and see if using a patch threshold is still good enough for our usage.

martinjaggi commented 1 year ago

yes, seems perfectly fine to use this score to train your linear regression. the score would be the distance of test patch to the nearest patch in the memory bank of regular patches. no threshold needed indeed. after training, the cool thing is we don't need any memory bank anymore, so we have a system that can be made fully private

were you able to give this a shot? no worries about the threshold, you can set it later by hand - or simpler just sort the test patches by the predicted score, and then see if those top anomalous ones are actually anomalous

lanterno commented 1 year ago

@martinjaggi Hi Martin, I understand your point, but there's a catch.

I can't use this value during training, because I trained the Linear Regression model to predict patch scores (the equation above is for calculating the score for the whole image)

What I've done is:

Trained the patchCore model on half the dataset
Used the resulting model to generate anomaly scores for each patch in the other half of the dataset.
Used the embeddings along with the values generated in step 2 to train a linear regression model.

At this point, I have a linear regression model that should be capable of predicting the anomaly score.

I used the linear regression model to predict anomaly scores for images in the test dataset, and I assumed that I can use the patch with the highest anomaly score as an indicator to identify if the whole image is anomalous or not. (I got the threshold from the patchCore trained model)

I'm worried that even if the LR model is good enough, using the patch with the highest anomaly score isn't a good way to calculate the overall image score (given that patchCore uses the equation mentioned in my previous comment.

That said, I'm working on getting some numbers validate / disprove those worries.

(If you think I misunderstood something in your suggested plan before, I would appreciate if we could have a call sometime to go over it)

martinjaggi commented 1 year ago

sounds all good. (BTW on step 1, if i understood right then patchcore doesn't need to be trained, if you just use image-net pretrained embeddings, or is there sth more?)

lanterno commented 1 year ago

on step 1, if i understood right then patchcore doesn't need to be trained, if you just use image-net pretrained embeddings, or is there sth more?

I used the pre-trained embeddings from image-net, but patchCore itself needed to be trained (I trained it on MVTec AD data "grid") to build the Memory bank then generate the anomaly scores (for the second half of the dataset that will train the LinearRregression model).

This is also what you mentioned before:

use pre-trained embeddings (i.e. before last layer of the CNN) from patch-core or imagenet or MVTech) - easier to not train our own if not strictly needed

make sure it gives a good nearest neighbor result (this is the patch-core baseline). check if accuracy is as reported as in patch-core

generate pseudo-labels from that patch-core like model. this is they key step how to go from unsupervised KNN to a regression model trained like a supervised one:

split each the dataset in half, and use teach the regression model to report the distance to the regular ones (as in patch core). so you create a pseudo-label for each of those 50% holdout patches.

I assumed that the pseudo-labels here mean the "patch-level anomaly scores" from the patchCore-like model

martinjaggi commented 1 year ago

yes, all good

On Tue, Jun 13, 2023 at 2:35 PM Ahmed Elghareeb @.***> wrote:

on step 1, if i understood right then patchcore doesn't need to be trained, if you just use image-net pretrained embeddings, or is there sth more?

I used the pre-trained embeddings from image-net, but patchCore itself needed to be trained (I trained it on MVTec AD data "grid") to generate the Memory bank and the anomaly scores.

This is also what you mentioned before:

use pre-trained embeddings (i.e. before last layer of the CNN) from patch-core or imagenet or MVTech) - easier to not train our own if not strictly needed

make sure it gives a good nearest neighbor result (this is the patch-core baseline). check if accuracy is as reported as in patch-core

generate pseudo-labels from that patch-core like model. this is they key step how to go from unsupervised KNN to a regression model trained like a supervised one:

split each the dataset in half, and use teach the regression model to report the distance to the regular ones (as in patch core). so you create a pseudo-label for each of those 50% holdout patches.

I assumed that the pseudo-labels here mean the anomaly scores per patch from the patchCore-like model

— Reply to this email directly, view it on GitHub https://github.com/c4dt/predictive-maintenance/issues/5#issuecomment-1589219402, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIMR7S5DMKKOIDPHPM22TXLBNBVANCNFSM6AAAAAAYB3DVQA . You are receiving this because you were mentioned.Message ID: @.***>

martinjaggi commented 1 year ago

hey @lanterno, in your latest commit the evaluation looks nice but i didn't get what the x-axis meant? if you want to compare all patches (on x axis) maybe best to sort them by patch-core score - or even use patch-score as the x value so we could see the correlation and would better know if it's linear or non-linear.

are there also simpler (aggregate) measures on the patch level like AUC/accuracy or similar?

lanterno commented 1 year ago

@martinjaggi Thank you! I also felt that might be a good way to see the correlation after I pushed the code. I added another graph after doing the ordering.

i didn't get what the x-axis meant?

The x-axis is just the index (maybe I should hide it)

maybe best to sort them by patch-core score

Done. please see it here at the end of the file.

(Note that I just took a random sample for the graph -> hence 4000 and 4100 are just random boundaries)

are there also simpler (aggregate) measures on the patch level like AUC/accuracy or similar?

From Thierry's input, it didn't seem worth it to explore other metrics since the relationship looks very non-linear. Maybe this will be obvious now that I drew the ordered scores.

Thity commented 1 year ago

@martinjaggi x-axis was the index so we could compare the scores computed by patchcore and LR. @lanterno : it could be more relevant not to draw a line but just points. Would it be possible to show the target at the same time for the image-level output ? (with two different colors for example). This way we will be able to see if a separation is possible with the LR output.

In cell #49, it seems there is indeed a correlation but not strong enough to have comparable results to patchcore. We could try replacing the LR by y bit deeper network (MLP similar than the one in patchcore).

martinjaggi commented 1 year ago

yes i was thinking the same, MLP instead of linear. (this one we know must work because it's basically identical to patchcore)

also in the last plot i still don't get why the red line is flat, as the scores even from patch-core should vary also?

lanterno commented 1 year ago

also in the last plot i still don't get why the red line is flat, as the scores even from patch-core should vary also?

@martinjaggi I took quite a small portion of the dataset (100 out of 60K points). I'm open to suggestions on other ways to plot the data.

One other approach I tried (you can see it now on this link) is to sample the data, but I got a feedback from @ineiti that sampling can work for patchCore since it's increasing linearly, but I need to use another plot for the LR model that abstracts the hidden distribution of the 300 (300 is my sampling rate) hidden samples (rather than pick a random one). Linus suggested a candlestick chart.

it could be more relevant not to draw a line but just points. Would it be possible to show the target at the same time for the image-level output ? (with two different colors for example). This way we will be able to see if a separation is possible with the LR output.

@Thity I think I can do that. will try.

martinjaggi commented 1 year ago

also did you check if the linear regression training error is small at least? (maybe too small compared to test error -> overfitting?)

next i'd suggest to train the MLP to make sure we're getting similar scores as patchcore. one also has to be aware that patch-core is not perfect ground truth, even if we here treat it as such. does MVtech have real ground truth as well? maybe not on patch level?

btw not sure i get the sampling idea.

lanterno commented 1 year ago

Plotting the image-level anomaly scores plot might be easier to make conclusions from: Screenshot 2023-06-19 at 17 48 35

But I want to get back to the sampling point:

The test dataset has around 70 images, so it's easy to plot them as above, but when comparing the patches, we have 60K data points, it was difficult to plot, because we get images like this: Screenshot 2023-06-19 at 17 51 15

To get a clearer picture, I wanted to sample the data, and maybe only plot a 100 data points

Approach A: Just take a random 100 consecutive points from the middle of data points:

The result was the flat line as we have seen in my previous comment.

Approach B: sample by taking 1 point (then skip some N points) This is the result: Screenshot 2023-06-19 at 17 54 57

But the problem is that linear regression samples are too random compared to the patchCore samples (we already sorted the patchCore samples so sampling still maintains the plot features, but it's not the same for the LR results)

Approach C: Take the mean over the sampled points, but now I have an even weirder plot Screenshot 2023-06-19 at 18 05 38

lanterno commented 1 year ago

also did you check if the linear regression training error is small at least? (maybe too small compared to test error -> overfitting?)

No, I will check that.

does MVtech have real ground truth as well? maybe not on patch level?

It does have the image level ground truth, and there is an image mask that should tell us pixel-level ground truth as well. Maybe I can take a look on how patchCore uses this ground truth to aid the training if at all.

next i'd suggest to train the MLP to make sure we're getting similar scores as patchcore. one also has to be aware that patch-core is not perfect ground truth, even if we here treat it as such.

ok

Thity commented 1 year ago

@lanterno: the ground truth we are considering is simply the label (correct/anomalous). The masks that show the anomalous regions are not used for training but for the more advanced task which is segmentation (for now we are just doing classification).

@martin: by real ground truth do you mean real numbers? No, mvtec doesn't have that. The score we get from patchcore (and on which we train the LR) can be seen as a "distance from normality" computed by patchcore model. As our training set only contains normal images, this method can be limited because the scores obtained from patchcore on normal images can be a bit noisy (let's see what we get with an MLP). Because in reality there is no objective reason for this score to be >0 on normal images, but as long it is < threshold, they are considered as normal by patchcore. The reason the scores vary on normal images is that patchcore computes a set of representatives embeddings (smaller than training set) and the score is the distance to the closest representative, which will vary for random reasons but will be much higher for anomalous samples.

Also patchcore doesn't have MLP, but simply compute the image score at inference and classify it according to threshold.

martinjaggi commented 1 year ago

i just meant the true yes/no label. (we're comparing to patch core but that doesn't get 100% either, so the comparison is a bit unfair if we just compare to that and not to the truth). in the end the classifier accuracy or AUC matters (you can get it by setting any threshold you like). just to evaluate ourselves in the same way as patch core claimed to be accurate

Thity commented 1 year ago

Then yes we have the ground truth. Training set simply contains normal images (no anomaly) and test set is composed of good and anomalous images. For now it's not yet an issue as patchCore is doing ~0.98 auroc and we are still far from that, but yes we should always measure the perf. according to ground truth.

lanterno commented 1 year ago

As you suggested, I switched to working on MLP.. New issue here: https://github.com/c4dt/predictive-maintenance/issues/9

I'm preparing a PR today with MLP. I also discussed with with Thierry, and based on advice, I'll be using the test data (anomalous data) to the training as well sa it seems both the LR, and the MLP models are unable to generalize well.

Will share that in a 3rd PR/ experiment once I finish testing MLP.

c4dt / predictive-maintenance

Linear Regression #5