something discuss for the results

BIGBALLON commented 4 years ago

First, thanks for your amazing work.

# Copy samples
copy_any(path/f'original/{case}/train/good', case_folder/'train')
copy_any(path/f'original/{case}/test/{sub}', case_folder/'train')
copy_any(path/f'original/{case}/test/good', case_folder/'test')

for the line50: https://github.com/daisukelab/metric_learning/blob/master/mvtecad_test.py#L50

copy_any(path/f'original/{case}/test/{sub}', case_folder/'train')

which means the training data also appeared in the test set, and I double-checked the images:

Selection_183

so I think the result of self-supervised is not correct.

daisukelab commented 4 years ago

Hello @BIGBALLON, thanks for feedback. It would look like data leakage, I'm sorry for confusing design.

First, I show that self-supervised sub-test uses only good samples for training by the code below:

    if subtest_type == 'self_supervised':
        data = artificial_image_list_cls.databunch(path/f'{subcase}/train/good',
                                                   size=img_size, tfms=tfms)

You can find it at mvtecad_test.py#L50.

And you can confirm by running followings in the notebook:

mvtecad.set_test(0, 0)
db = mvtecad.databunch()
db.train_ds.items

Output will be like:

array(['/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/173.png',
       '/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/173.png',
       '/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/139.png',
       '/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/139.png', ...,
       '/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/041.png',
       '/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/041.png',
       '/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/039.png',
       '/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/039.png'], dtype='<U69')

I guess it would be clear for you now:

Yes everything is copied to sub-case folders.
But what is used for training is selected when creating databunch.

I hope this answers your question, please feel free to ask anything unclear. :)

BIGBALLON commented 4 years ago

@daisukelab thanks for your answer, I get it now.

daisukelab / metric_learning

something discuss for the results #1