Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
226 stars 45 forks source link

Perform sanity checks and follow good coding practices #52

Open evancofer opened 5 years ago

evancofer commented 5 years ago

Have you checked the list of proposed rules to see if the rule has already been proposed?

Did you add yourself as a contributor by making a pull request if this is your first contribution?

Feel free to elaborate, rant, and/or ramble. When coding DL models, it is important to maintain good software engineering practices. All code should be documented and include rigorous tests. Sanity checks are also useful. For instance, something is probably wrong (e.g. bug in code, ill posed problem, bad hyperparameters) if model training loss does not decrease (i.e. not overfitting) when considering a very small subset of the training data.

Any citations for the rule? (peer-reviewed literature preferred but not required)

evancofer commented 5 years ago

This is slightly similar to #49 , #35 , and #21

rasbt commented 5 years ago

I think it would be good to have your suggestion as s a separate rule, but it is also somewhat connected #42, the fact that we need to usually have a larger/more extensive model selection part when using deep learning as opposed to "traditional" machine learning. In addition, we need to "more babysit" the different model fitting procedures and evaluating the internal procedure ("does it converge?") vs just the external metrics ("what is the prediction accuracy?")

evancofer commented 5 years ago

Indeed. In many cases however, sanity checks need to occur before hyperparameter optimization occurs.

Benjamin-Lee commented 5 years ago

All code should be documented and include rigorous tests.

With respect to this, I just published a paper on the exact topic that might be worth citing. I'm happy to expand further on documentation best practices for DL.

agitter commented 5 years ago

I also recommend Top considerations for creating bioinformatics software documentation for software documentation

pstew commented 5 years ago

@Benjamin-Lee Congrats! I think this and the paper linked by @agitter would be great to cite here.

evancofer commented 5 years ago

Those all look like pretty relevant citations, and we should definitely keep them in mind as we draft. IIRC this discussion of testing & other software engineering best practices was going to go into Tip # 1 (deep learning is still machine learning), but maybe it should go somewhere else if we intend to discuss certain aspects (e.g. testing) in more detail?

fmaguire commented 5 years ago

Sort of mentioned in tip 3, need to add this reference to it.