Insight on choosing appropriate data set characteristics.

The standard thing to do when training DNNs is to get as much data as possible. However, that is not always possible. Sometimes the data set may have few samples of one particular class, or in overall few examples per class. Also, if we want to categorize things that look alike, say variations of grass haha, then I suppose this will have a big impact in the final accuracy, even if we use state-of-the-art models.

In your experience, in what cases is particularly important to consider not only the data set size? When this situation presents itself, what other dataset characteristics would you recommend looking at?

Couldn't find formal documentation on the matter yet... I hope the community can help. Thanks!!

awesomedata / awesome-public-datasets

Insight on choosing appropriate data set characteristics. #338