brainstorming - 'YOLO' - Githubissues

dobkeratops commented 6 years ago

interesting research,

I'd heard this term thrown around, wasn't sure what it refered to.. it's definitely related to the idea of scenes: Supposedly it's doing the same job as R-CNN (finding objects) but 'uses a single neural net taking in the whole image' ..

I'd very very surprised that a single net can be better than the adaptive approaches described - (I'm convinced regions and branching nets are the way forward)- but I can absolutely believe having some full-scene info helps object identification.

so thats the sort of thing my interest in 'full scene images' (rather than single objects) relates to, but I also then worry about the effect of the fisheye images (.. would the detection of whole scene context be vague enough to handle the distortions - just like we can deal with looking at fisheye or normal images.) either way I hope I can submit some more 'regular' images for my next batches..

they use the 'COCOs' training set.

of course the types of image will matter for scene training.. but the common urban environments are better than nothing. (you could always say 'this dataset is suitable for urban scene training..') getting lots of domestic interiors might be a lot harder because people are more sensitive about their personal space. (I'm looking around my messy room and I'd be embarrassed to share it online haha)

I suppose you could use videos with the free label of motion prediction (ask the net 'is time reversed?' or predict the major changes in optical-flow?) or show it some sample films (and make it predict 'what genre of film' to try and get a general sense of scenes (network training with some shared early part then 2 branches.. one doing 'something general' and another using the specific labels, the idea being the bigger dataset for broad training helps the specific task)

bbernhard commented 6 years ago

that sounds interesting, many thanks for sharing!

getting lots of domestic interiors might be a lot harder because people are more sensitive about their personal space. (I'm looking around my messy room and I'd be embarrassed to share it online haha) :D

I guess we could also try to query flickr, wikimedia.. for those type of images and use all the images that are put into public domain.

As you mentioned YOLO: Something that could probably be really beneficial to the dataset growth is the integration of a machine learning framework. Training your own model on some data should be as easy as writing a handful lines of code - everything else should be abstracted away from the user. I guess if we make that right and add support for at least one machine learning framework, we could probably attract a different of users (who, in return hopefully also contribute a little bit to the dataset)

dobkeratops commented 6 years ago

integration of a machine learning framework. Training your own model on some data should be as easy as writing a handful lines of code

definitely! .. (I think i remember seeing this site testing a classifier already?)

what interested me a lot personally is the branching CNN idea , hence the idea of the label tree;

you could make the tree branches match the label tree, but perhaps users could customise which nodes to use as roots ("render me a branched CNN for vehicle classification - the root is 'what type of vehicle, the next level has refinements of cars types, truck types, motorbike types..) The site could use the whole dataset to train the most generic root ('animal vs plant vs vehicle vs building vs person' would be my guess..)

ImageMonkey / imagemonkey-core

brainstorming - 'YOLO' #109