Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
226 stars 45 forks source link

Outstanding issues not specific to any tips #252

Open SiminaB opened 3 years ago

SiminaB commented 3 years ago

This is to discuss any issues that we may think are not currently adequately covered. If they relate to specific tips, use #242 #243 #244 #245 #246 #247 #248 #249 #250 #251

SiminaB commented 3 years ago

In re-rereading this, there are 2 issues that I thought about that we may want to cover. At minimum, I think many people reading this paper will expect them to be covered. I think they can be included in the Intro or Conclusion or as part of existing tips: 1) How does one go about fitting these models and is special software always required? We can at least give some good references for how to do this and note the main packages and computational requirements. I know this isn't a "getting started with DL" paper, but we can still spend 2-3 sentences on it. 2) Can DL be inadvertently used to perpetuate existing stereotypes eg racist and sexist ones? We know this can happen either because of the training set (eg training set consists exclusively of individuals of European descent, then model is used on a more diverse population) or because of the predictions are incorrectly interpreted due to confounding (eg the training set has doctors and nurses and most doctors are men and most nurses are women, therefore going forward gender is either explicitly or implicitly used to play an outsized role in predicting career choice.) The paper focuses on biology, so perhaps one good example would be the performance of face recognition approaches on individuals of European vs. non-European descent.

Benjamin-Lee commented 3 years ago

Some thoughts in response:

  1. We should mention them as well as mention using auto-ML tools like TPOT.
  2. DL fairness should probably be mentioned in the interpretation or privacy tips. Which place do you think is better?
SiminaB commented 3 years ago

We could change Tip 10 to be about ethics I guess? That way both fairness and privacy would fit.

Benjamin-Lee commented 3 years ago

@SiminaB I just addressed your first point in the PR for #241. Specifically, I mentioned TF and PyTorch as well as Keras, AutoKeras, Turi Create, and TPOT. If there are any other tools you think are worth mentioning, do let me know.

SiminaB commented 3 years ago

Looks good! One question as someone who doesn't use DL in research - can you actually run meaningful DL models on a laptop? The implication is that it would be hard to do so, eg in:

In contrast, traditional ML training can often be done on a laptop (or even a \$5 computer [@arxiv:1809.00238]) in seconds to minutes.

Benjamin-Lee commented 3 years ago

It's doable in some cases but not really ideal. In my experience, I've always ended up having to use a cloud machine for training all but the simplest models. I've never done transfer learning so I can't comment on whether that brings things down to consumer-grade laptop level. @rasbt probably knows more than I do about that.

SiminaB commented 3 years ago

I think it would be helpful to clarify this as it would help inform someone whether they can actually do DL. If it is appropriate to their problem but not really doable on their device, of course they can look into using the cloud or initiating a collaboration.

Benjamin-Lee commented 3 years ago

Definitely a good idea to speak affirmatively to what DL needs.

agitter commented 3 years ago

I'm copying my comment from https://github.com/Benjamin-Lee/deep-rules/pull/313#issuecomment-760316895 here so we don't lose track of it.

These are all minor enough to address after the initial submission.

Benjamin-Lee commented 3 years ago

Thank you for adding it here and glad to see nothing else is blocking. I'll work on #237 once we do the content freeze since that is cosmetic.