awesomedata / awesome-public-datasets

A topic-centric list of HQ open datasets.
https://awesomedataworld.slack.com
MIT License
59.24k stars 9.76k forks source link

Insight on choosing appropriate data set characteristics. #338

Closed slothkong closed 6 years ago

slothkong commented 6 years ago

The standard thing to do when training DNNs is to get as much data as possible. However, that is not always possible. Sometimes the data set may have few samples of one particular class, or in overall few examples per class. Also, if we want to categorize things that look alike, say variations of grass haha, then I suppose this will have a big impact in the final accuracy, even if we use state-of-the-art models.

In your experience, in what cases is particularly important to consider not only the data set size? When this situation presents itself, what other dataset characteristics would you recommend looking at?

Couldn't find formal documentation on the matter yet... I hope the community can help. Thanks!!