Closed BIGBALLON closed 4 years ago
Hello @BIGBALLON, thanks for feedback. It would look like data leakage, I'm sorry for confusing design.
First, I show that self-supervised
sub-test uses only good samples for training by the code below:
if subtest_type == 'self_supervised':
data = artificial_image_list_cls.databunch(path/f'{subcase}/train/good',
size=img_size, tfms=tfms)
You can find it at mvtecad_test.py#L50.
And you can confirm by running followings in the notebook:
mvtecad.set_test(0, 0)
db = mvtecad.databunch()
db.train_ds.items
Output will be like:
array(['/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/173.png',
'/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/173.png',
'/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/139.png',
'/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/139.png', ...,
'/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/041.png',
'/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/041.png',
'/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/039.png',
'/mnt/dataset/mvtec_ad/case0-0-bottle-broken_large/train/good/039.png'], dtype='<U69')
I guess it would be clear for you now:
I hope this answers your question, please feel free to ask anything unclear. :)
@daisukelab thanks for your answer, I get it now.
First, thanks for your amazing work.
for the line50: https://github.com/daisukelab/metric_learning/blob/master/mvtecad_test.py#L50
which means the training data also appeared in the test set, and I double-checked the images:
so I think the result of
self-supervised
is not correct.