YeonwooSung / ai_book

AI book for everyone
24 stars 5 forks source link

Radioactive data: tracing through training #5

Open YeonwooSung opened 4 years ago

YeonwooSung commented 4 years ago

Abstract

Basically, this paper introduces a method to mark a dataset with a hidden "radioactive" tag, such that any resulting classifier will clearly exhibit this tag, which can be detected.

Details

fig2 distribution

Personal Thoughts

Clearly, data is the modern gold. Neural classifiers can improve their performance by training on more data, but given a trained classifier, it's difficult to tell what data it was trained on. This is especially relevant if you have proprietary or personal data and you want to make sure that other people don't use it to train their models. This paper introduces a method to mark a dataset with a hidden "radioactive" tag, such that any resulting classifier will clearly exhibit this tag, which can be detected.

YeonwooSung commented 4 years ago

As I mentioned above, the main aim of this paper is to find the method to check if other people are training their model with your data.

I assume this might be related to the data privacy issue?