Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
226 stars 45 forks source link

Use training/tuning/testing dataset terminology #74

Open michaelmhoffman opened 5 years ago

michaelmhoffman commented 5 years ago

Have you checked the list of proposed rules to see if the rule has already been proposed?

Did you add yourself as a contributor by making a pull request if this is your first contribution?

Feel free to elaborate, rant, and/or ramble.

I would suggest the use of training/tuning/testing rather than training/validating/testing or (God forbid) training/testing/validating to avoid any confusion in different domains. I see inconsistent use of this terminology in existing issues and draft text. See #19 for an example.

Any citations for the rule? (peer-reviewed literature preferred but not required)

cgreene commented 5 years ago

Big fan of this change 👍 . It'll be nice to get away from two different field-specific vocabularies that intersect painfully in bioinformatics.

michaelmhoffman commented 5 years ago

What label should an issue like this have? "meta" or something else?

agitter commented 5 years ago

What label should an issue like this have? "meta" or something else?

I agree with "meta". It is proposing terminology to use consistently as opposed to a new rule or discussion of a specific paper.

However, I'm not convinced that we want to adopt training/tuning/testing terminology. It is a more accurate description of how the datasets are used, but proposing uncommon terminology could confuse our target audience when they read other machine learning and deep learning literature (obligatory https://twitter.com/michaelhoffman/status/989977986471514112).

There are certainly different uses of validation and test set terminology across domains, but within machine learning most sources are consistent. The Wikipedia article in your tweet above has many sources, and the random textbooks I grabbed (Kevin Murphy and Christopher Bishop's ML books) also use training/validation/test.

pstew commented 5 years ago

I'm a fan of training/validating/testing myself, but I would be happy with whatever doesn't confuse readers. I think we should come to a consensus for consistency and then just mention in the text, "Here, we use a/b/c, but we note this terminology is interchangeable with x/y/z."

jmschrei commented 5 years ago

I think that train/validate/test is the more commonly associated name, but I agree that train/tune/test would make it more intuitive what each set is for. I haven't seen many people actually use train/tune/test in practice, and I agree with @agitter that proposing as a rule some uncommon terminology may not be the best idea.

michaelmhoffman commented 5 years ago

The problem is that validation is common terminology for what comes after testing for many people in biomedical research. You can argue that using validation for what comes before testing is more correct but that doesn't eliminate the confusion.

Some people say December 7, 2018 should be written 12/7/2018. Others say it should be written 7/12/2018. There is ample historical precedent in different communities for both, and you can argue until you're blue in the face about which one is correct or makes more sense. Or you can switch to the unambiguous 2018-12-07.

jmschrei commented 5 years ago

Perhaps the rule here should be that, when discussing these splits, one should make sure to define how you are using each one clearly enough that people from another field can follow?

rasbt commented 5 years ago

The problem is that validation is common terminology for what comes after testing for many people in biomedical research. You can argue that using validation for what comes before testing is more correct but that doesn't eliminate the confusion.

yeah, technically, both a validation and a test set are actually used for "validation"/evaluation, just at different stages of the pipeline. From a language perspective, I would say that test and validation set mean the same thing -- also, in k-fold cross-validation, we usually refer to training and "test" folds.

I like the term "tuning dataset" though, because it disambiguates it a bit for practitioners who are not familiar with jargon in ML.

Maybe the best of both worlds is to use the "official jargon" but explain it well. This will then also help people with reading ML literature. I.e., we could say that we split the dataset into three parts, a training, validation, and test set ... the validation set is used to evaluate the model performance during training and model selection and thus can be regarded as a "tuning dataset." In contrast, the test set shall only be used once, after the tuning has completed, to evaluate the final performance of the model (if we only use the test set once, it is an unbiased estimator of the generalization performance).

agitter commented 5 years ago

I'm okay using training/tuning/testing if we also prepare readers for the terms they will encounter in machine learning or biomedical literature. There is precedent for training/tuning/testing in machine learning. The earliest reference I found (so far) is from Readings in Machine Learning:

Benjamin-Lee commented 5 years ago

I've never heard of the terms training/tuning/testing until just now, only training/validating/testing. With that being said, I am 100% a fan of training/tuning/testing because:

  1. Alliteration
  2. It actually makes intuitive sense

I am personally going to use that terminology in my own research going forward, so thank you @michaelmhoffman for introducing it to me.

michaelmhoffman commented 5 years ago

For the record @hugoaerts was already using it here before I arrived 😄.