ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

Info -interesting stats #292

Open dobkeratops opened 3 years ago

dobkeratops commented 3 years ago

For comparison “CIFAR10 = 10 categories x 6000 examples” 60000 images but just 32x32 images. CIFAR100 = 60000images divided between more classes (600 each)

im wondering how many of our categories we could train something decent on

Urban environments Road = 4994 Pavement = 3547 “Road vehicle”=Car,truck,bus,van = 3932 Building = 2305

Person=2381 Man=1020, woman=1774, (I wasn’t sure if the existing logic combined these into a “person” search bus as man+woman > person, maybe not? combined counts Man,woman,child,boy,girl = 2885 Person,man,woman,child ,boy,girl= 6898 People with common states -(eg “man/walking” {Man,woman ,person} x {walking,sitting,running,standing} = 2837 All common “person” annotations (gender,states) = 8097 (Few more states = reclining,sittingCrossLegged,excercising,playingGuitar,reading,sleeping,leaning etc)

head/man =2036 Head/woman=3685

Animals Dog=1367. Head=708. Cat=693 head= 427 Lemur =1099 red_panda=878 All “quadrupedal_mammal” (dog cat horse cow etc) = 4605 Individually most of those only have ~100 examples

ungrouped.. Head=817 (these will be a mix of human and animal) Hand=311 Foot=54

total annotations =?

bbernhard commented 3 years ago

Very cool, thanks for sharing!

Would be really interesting to see whether we could get a decently trained model out of that data :thinking:

I wasn’t sure if the existing logic combined these into a “person” search bus as man+woman > person, maybe not?

At the moment there's no label substitution performed. It's on my Todo list, but I haven't had time to look into that :) So all the numbers are "raw" numbers :)

btw: Here are also some global stats, in case you are interested:

https://imagemonkey.io/statistics/contributions

I am always blown away by looking at those graphs. It's even more impressive when you consider that probably > 95% of that data was contributed by you. WOW!