Understand the data - Githubissues

Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology

https://benjamin-lee.github.io/deep-rules/

Other

227 stars 45 forks source link

Understand the data #12

Open khyu opened 6 years ago

khyu commented 6 years ago

Before hitting the data with fancy DL algorithms, always spend some time to understand how the data were generated, how the samples are selected, and what were the assumptions in the data generation and data cleaning process.

khyu commented 6 years ago

This applies to data science in general, not just DL.

brettbj commented 6 years ago

I was going to add an additional rule but it may fit here. Have a hypothesis of why deep learning would work. e.g. In an image there is structural information so recognizing local features and pooling works well in a CNN. In structured data without explicit ordering, why would we expect a CNN to work.

brettbj commented 6 years ago

I see my comment applies more accurately to #22

fmaguire commented 5 years ago

Implicitly mentioned in https://github.com/Benjamin-Lee/deep-rules/blob/master/content/06.know-your-problem.md but might want to make more explicit, especially @brettbj's comment about having a hypothesis for why DL might work for your data.

rasbt commented 5 years ago

Yeah. Unfortunately, there's no real guideline (similar to classic machine learning where there is no hard recommendation for which feature engineering approach should be used), we could maybe really give some rule-of-thumb advice to make this more tractable. I.e., I imagine the main audience will be researchers who are not familiar with DL (yet) and wonder: "would DL help with my problem?" We could say sth along this lines that if we have a large, unstructured dataset in a raw form, usually text or image data, DL could potentially be useful as it could be able to automatically extract features where it is not obvious for a human. Or sth. like that.