Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
227 stars 45 forks source link

Rephrase "images of cats from the internet" #159

Closed Benjamin-Lee closed 5 years ago

Benjamin-Lee commented 5 years ago

As @cgreene and @agitter have pointed out in #158, this sentence might not be the best:

Applying deep learning to images of cats from the internet does not pose significant ethical, legal, or privacy problems; this is not the case when dealing with classified, confidential, trade secret, or other types of biological data that cannot be shared [@doi:10.1371/journal.pcbi.1005399].

@cgreene has proposed this as an alternative:

In medicine, practitioners may more frequently encounter datasets where privacy is expected and there would be significant ethical or legal issues associated with release.

I personally think that the second half of the line (after the semicolon) is good but that the first part might be misconstrued. We could use MNIST instead of cat pictures as a generic DL task (and possibly add a footnote for those who don't know what MNIST is):

Applying deep learning to images of handwritten digits[^1] does not pose significant ethical, legal, or privacy problems; this is not the case when dealing with classified, confidential, trade secret, or other types of biological data that cannot be shared [@doi:10.1371/journal.pcbi.1005399].

[^1]: The most common introductory machine learning task

evancofer commented 5 years ago

I actually agree with Casey's suggestion. I suspect that some generic DL problems do have significant ethical/legal/etc ramifications, but have not been associated with these issues yet. Granted, many of these could be much smaller (in terms of potential damage or degree of disruption) than the those faced in biomedical applications. I realize that this is a somewhat minor point, but I think it is important to keep it in mind.

rasbt commented 5 years ago

Hm, I think that replacing "cats" by "handwritten digits" is maybe not too bad but then it kind of loses the "punchline" a bit, and I think this sentence might then sound a bit awkward to a person who is new to the field. I.e., one might would wonder, "handwritten digits? Where does that come from now?" whereas for "images of cats from the internet," it more naturally indicates that it is a placeholder for some hobbyist task.

I.e., instead of saying "Fearing a rise of killer robots is like worrying about overpopulation on Mars" one could say "Fearing a rise of killer robots is like worrying about overpopulation in North Dakota" -- same point but a bit less obvious to someone not from the US.

AlexanderTitus commented 5 years ago

Since this is targeted at people who are not overly familiar to deep learning, I intended the "pictures of cats on the internet" reference to be a call out to the common reference to cats on the internet. From a practitioner point of view, I agree MNIST is likely the most common first challenge, but I feel it is not as accessible to a general audience.

Thoughts? I can change if we still feel it would be best to update.

rasbt commented 5 years ago

plus, classifying cats is one of the most popular, classic DL examples (https://www.wired.com/2012/06/google-x-neural-network) so DL folks would probably also appreciate it as an easter egg ;).

AlexanderTitus commented 5 years ago

@agitter @cgreene @Benjamin-Lee @evancofer What are your thoughts here? Should we change the sentence or keep as is. I'd like to close this out and you had asked for a change. I like the cat reference, but if you more prefer a different reference, I can update.

cgreene commented 5 years ago

ctrl-f of the current manuscript for cats doesn't return anything, so I think this has already been changed.

cgreene commented 5 years ago

Oh - this was in #160 and I re-worked that sentence entirely. So yes, the cats are gone and apparently it's all my fault! 😼

AlexanderTitus commented 5 years ago

Sounds good to me! Glad we can wrap up a couple of PRs and issues!

Benjamin-Lee commented 5 years ago

I realize that the PR is merged already, but we could conceivably add

While cat pictures from the internet [@url:https://www.wired.com/2012/06/google-x-neural-network/, @url:https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf] don't pose much of a danger if revealed, practitioners may encounter datasets that cannot be shared, such as ones for which there would be significant ethical or legal issues associated with release

agitter commented 5 years ago

I don't see what it adds, and even if this particular example does happen to be benign, I don't think it is helpful for the field if we assert that certain tasks are free of ethical concerns. Better to be vigilant even on tasks that seem whimsical or safe.