Benjamin-Lee commented 5 years ago

Now for the fun part! We have a bunch of rules that have been proposed, but can only have ten rules in the paper:

Dashnow H, Lonsdale A, Bourne PE (2014) Ten Simple Rules for Writing a PLOS Ten Simple Rules Article. PLoS Comput Biol 10(10): e1003858. https://doi.org/10.1371/journal.pcbi.1003858

Let's start winnowing down the rules. As @rasbt proposed in #47:

A next step would maybe be that we vote on the rules to include. Maybe everyone (the contributors here) should give a 1-10 score for each rule and then we can take the top 10 rules based on the highest average scores and see if the results make sense (and whether we maybe need to look for additional ideas).

Let's do this programmatically. How does a CSV wherein contributors PR/push their votes sound?

jmschrei commented 5 years ago

Many of the proposed rules are similar to each other and voting on them individually may lead to redundant selections (#16, #19, #20, #27, #28 all cover a similar issue). I would suggest that people propose full lists as issues here and refine them after feedback from others. After a few lists are proposed / refined, we could jump on a conference call and merge things down to a single list.

rasbt commented 5 years ago

that's a good idea. We may prob have to pre-merge them before voting. So I guess the focus would be first on cleaning up the issues list and then using the issue numbers as column in the CSV for a further discussion/voting.

Maybe @Benjamin-Lee or @jmschrei want to take the lead and make a "master" list from the suggestions so far, with similar ones all in the same row, then we discuss the phrasing we want to pick for each of the similar (yet slightly differently formulated) suggestions? Maybe something along the lines of

Index	Suggested rule title	Related issue numbers	Summary of topics to mention
1	- Some Title - Alt. Title - Another alt. Title	#xx #xx #xx	- main point to mention - another relevant point
2
3
...

tbrittoborges commented 5 years ago

36 and #25 are related. @betsig what do you think?

signalbash commented 5 years ago

Agreed - these both address very similar issues. Probably needs a more snappy title though.

signalbash commented 5 years ago

Looking through everything I think I can condense some of the major points down to a smaller number of rules.

Rules for ML apply to DL (#37 )
- Garbage In/Garbage Out (#26 )
- Normalise your data (#7 )
- Make sure data is not biased/skewed (#43 )
- Make it reproducable (#21 )
- Validate it (#27 )
- Test/train/validate splitting (#20 , #19 )
Know your data and your question
- (#31 , #12 , #13, #18 )
Choose a model approriate to the data (#29 )
- A bit of discussion on DL architectures and how they apply to different problems here would be helpful
Tune your hyperparameters (#42 , #11 )
- Again, discussion on DL specific tuning would be helpful. i.e. layers, dropout, activation functions etc.
Limit overfitting (#28 )
- Some discussion on how not to overfit particulary in the context of DL methods.
Compare to non-DL methods as baselines (#41 , #11 , #10 )
A DL model can be, but doesn't need to be a black box
- How to interpret DL models (#36 )
- SImilar to 6. Check if DL is actually a significant improvement in performance over a more 'interpretable' model (#25 )
Interpret predictions in the correct manner
- How to apply a model (#30 )
- Correlation != Causation (#9 )
- Don't overinterpret (#33 )

Completely up for discussion and there's room to add more rules.

rasbt commented 5 years ago

That's really awesome. Thx for doing this! I think this is really nice. The one thing we would have to do when organizing content for the paragraphs then is to have biology-related examples, because right now, it sounds very general.

@Benjamin-Lee Maybe we could keep this nice summary by @betsig in an .md file so that we can discuss and edit it via PRs.

SiminaB commented 5 years ago

Looks really good! In terms of @rasbt's comment, I wonder if we should write something about how it's been very successfully used in imaging, which is partially due to the large training sets but also to the fact that we don't have a correlation vs. causation issue there; it's sort of similar for transcription binding (if I'm interpreting things correctly - I'm not in those specific subfields). However, this is different from looking at disease prediction or outcome prediction if we're looking at things that happen with a time lag (eg can we predict cancer risk decades into the future or which drug will lead to a better response in terms of overall survival) Does this make any sense?

Benjamin-Lee commented 5 years ago

Thanks again @betsig! Her comment is live as an md file in rules.md.

pstew commented 5 years ago

@betsig did a wonderful job. Glad to see this was adopted. @rasbt I think if we do a good job that the rules could/should be applicable outside of bioinformatics/computational biology, but of course we should have a biological slant and use biological examples wherever possible. @SiminaB of course it makes sense. I think your comment has the makings of a paragraph that should go in the introduction or in one of the early rules.

rasbt commented 5 years ago

@pstew

@rasbt I think if we do a good job that the rules could/should be applicable outside of bioinformatics/computational biology, but of course we should have a biological slant and use biological examples wherever possible.

Sure, we just have to be careful that "Ten Simple Rules for Deep Learning in Biology" doesn't become "Ten Simple Rules for Deep Learning," because that would be just like opening Pandora's Box :P

Benjamin-Lee commented 5 years ago

With that being said, we should probably specifically mention that our rules apply beyond biology in the conclusion as well as be sure to point out if any of our rules don't apply outside of biology.

From looking at the current list of proposed rules (and thinking in general), I highly doubt that this will be an issue.

tbrittoborges commented 5 years ago

Now that we have the scope and the rules, how will it work for the writing-up? Do we 'divide to conquer' in a rule-wise manner?

cgreene commented 5 years ago

From our experience with the deep learning review, I would imagine that divide to conquer, review and refine, divide, refine is one path to a document.

smsaladi commented 5 years ago

Hello! I think I am a bit late to the game, but I would like to propose an additional rule (perhaps 9.) that asks the deep learning developer (deep learner?) to consider planning for the usage/future of the model after it is published. I've added an issue with thoughts to consider here: #63

agitter commented 5 years ago

I agree with @cgreene once the rules are finalized. However, per the last update (#57) I believe the team is still merging and finalizing the rules.

SiminaB commented 5 years ago

@pstew I think Frank Harrell is saying the same thing I was trying to say but doing a better job with it: "ML and AI have had their greatest successes in high signal:noise situations, e.g., visual and sound recognition, language translation, and playing games with concrete rules. What distinguishes these is quick feedback while training, and availability of the answer. Things are different in the low signal:noise world of medical diagnosis and human outcomes. A great use of ML is in pattern recognition to mimic radiologists’ expert image interpretations. For estimating the probability of a positive biopsy given symptoms, signs, risk factors, and demographics, not so much." (bold is his own, quote from http://www.fharrell.com/post/stat-ml/ - don't necessarily agree with everything he writes in that post, but he makes some good points that we should keep in mind for our paper) He also differentiates between "prediction" and "classification" (also at http://www.fharrell.com/post/classification/) which was interesting to me.

evancofer commented 5 years ago

Should we consider merging rules 7 & 8, since they are covering very closely related (potentially intersecting) concepts?

SiminaB commented 5 years ago

Not sure - isn't the black box aspect a bit different from the correlation/causation aspect? Maybe we should just relabel them? They are also related to question 2.

tbrittoborges commented 5 years ago

I don't think we should merge rule 7 (A DL model can be, but doesn't need to be a black box) and rule 8 (Interpret predictions in the correct manner), although I agree the titles are somewhat overlapping. Rule 7 regards to the interpretability of the model (if we can extract novel, domain-specific information from the model) and rule 8 how one should interpret the predictions/classifications.

rasbt commented 5 years ago

Yeah, I would say 7 is more about model training and selection (how do I make sure I don't overfit the training dataset). And rule 8 is more about evaluation in terms of "what does the results mean" (e.g., things like if I have a bioactivivity dataset, what are discriminants of biological activity, how are active and non-actives different?)

pstew commented 5 years ago

Concur with @tbrittoborges . There might be some overlap, but I think there is enough material to keep these distinct.

ttriche commented 5 years ago

1) "Rules for ML apply to DL" -- well, yeah, DL is a subset of ML techniques. But this can go further -- most rules from statistics and basic experimental design apply to DL. Train on unbalanced classes with a metric that weights them all, get unbalanced predictor. More data beats better models (all else equal) and better data beats worse data (ibid).

3) "Choose a model appropriate to the data" -- I would modify this to "appropriate to the task". Is the goal classification of a binary (or mutually exclusive type of) event? Does it matter whether the model is interpretable? (Is the goal an endpoint, e.g. a product to predict a medical condition? Or is the goal to progress towards an understanding of some phenomenon?)

8) "correct manner" -- what is the correct manner for interpretation of e.g. a DNN or a CNN? What is overinterpretation?

A general note: the old saws about GIGO and small sample sizes / autopsy-rather-than-advice tend to apply, but are there instances where they appear not to? If so, how do these instances come about? Are there examples where nonlinearity is so profound that only by training across a large(ish) number of small(ish) cohorts or corpora that a regularized representation of these nonlinear relationships emerge? I've been struggling with this since the beginning -- is it in fact true that a universal function approximator needs obscene amounts of data to do anything useful, or are there counterexamples that do not consist solely of collinear artifacts?

--t

On Mon, Nov 19, 2018 at 4:37 PM betsig notifications@github.com wrote:

Looking through everything I think I can condense some of the major points down to a smaller number of rules.

Rules for ML apply to DL (#37 https://github.com/Benjamin-Lee/deep-rules/issues/37 )

Garbage In/Garbage Out (#26 https://github.com/Benjamin-Lee/deep-rules/issues/26 )

Normalise your data (#7 https://github.com/Benjamin-Lee/deep-rules/issues/7 )

Make sure data is not biased/skewed (#43 https://github.com/Benjamin-Lee/deep-rules/issues/43 )

Make it reproducable (#21 https://github.com/Benjamin-Lee/deep-rules/issues/21 )

Validate it (#27 https://github.com/Benjamin-Lee/deep-rules/issues/27 )

Test/train/validate splitting (#20 https://github.com/Benjamin-Lee/deep-rules/issues/20 , #19 https://github.com/Benjamin-Lee/deep-rules/issues/19 )

Know your data and your question

(#31 https://github.com/Benjamin-Lee/deep-rules/issues/31 , #12 https://github.com/Benjamin-Lee/deep-rules/issues/12 , #13 https://github.com/Benjamin-Lee/deep-rules/issues/13, #18 https://github.com/Benjamin-Lee/deep-rules/issues/18 )

Choose a model approriate to the data (#29 https://github.com/Benjamin-Lee/deep-rules/issues/29 )

A bit of discussion on DL architectures and how they apply to different problems here would be helpful

Tune your hyperparameters (#42 https://github.com/Benjamin-Lee/deep-rules/issues/42 , #11 https://github.com/Benjamin-Lee/deep-rules/issues/11 )

Again, discussion on DL specific tuning would be helpful. i.e. layers, dropout, activation functions etc.

Limit overfitting (#28 https://github.com/Benjamin-Lee/deep-rules/issues/28 )

Some discussion on how not to overfit particulary in the context of DL methods.

Compare to non-DL methods as baselines (#41 https://github.com/Benjamin-Lee/deep-rules/issues/41 , #11 https://github.com/Benjamin-Lee/deep-rules/issues/11 , #10 https://github.com/Benjamin-Lee/deep-rules/issues/10 )

A DL model can be, but doesn't need to be a black box

How to interpret DL models (#36 https://github.com/Benjamin-Lee/deep-rules/issues/36 )

SImilar to 6. Check if DL is actually a significant improvement in performance over a more 'interpretable' model (#25 https://github.com/Benjamin-Lee/deep-rules/issues/25 )

Interpret predictions in the correct manner

How to apply a model (#30 https://github.com/Benjamin-Lee/deep-rules/issues/30 )

Correlation != Causation (#9 https://github.com/Benjamin-Lee/deep-rules/issues/9 )

Don't overinterpret (#33 https://github.com/Benjamin-Lee/deep-rules/issues/33 )

Completely up for discussion and there's room to add more rules.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Benjamin-Lee/deep-rules/issues/48#issuecomment-440050303, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIvJ_N22eWvjvSpAMyD18I-V9F1rBks5uwySSgaJpZM4YoRuZ .

rasbt commented 5 years ago

8) "correct manner" -- what is the correct manner for interpretation of e.g. a DNN or a CNN? What is overinterpretation?

That's a very good question. I do think there is some overlap between this and rule 7. Rule 7 seems to be more geared towards technical aspects (generalization error among others), whereas rule 8 goes more into the "does the network use salient information." Some real-world example from a past research project: I found that based on my dataset, according to my model, active molecules against a certain target all contained a sulfate group. However, this didn't imply that a sulfate group makes a molecule active. I guess rule 8 aims to go more into this kind of scenario.

I wouldn't necessarily merge rule 7 and 8, although doing that wouldn't be bad either.

As I mentioned in another thread, I think,

Maybe Rule 3 (Understand the complexities of training deep neural networks) & 6 (Tune your hyperparameters extensively and systematically) can be merged
Also, Rule 4 (Know your data and your question) & 5 (Choose an appropriate neural network architecture and data representation (#29)) could be merged.

AlexanderTitus commented 5 years ago

This has been agreed upon and is in process.

Benjamin-Lee / deep-rules

Condense down to ten rules #48

36 and #25 are related. @betsig what do you think?