ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
46 stars 10 forks source link

brainstorming empty road images.. 'road' + layout labels? #79

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

I wonder if there would be any value (r.e. self-driving car experiments) in road layout markup, i.e. the seemingly 'boring' empty road images could actually serve a purpose. I'm sure there's plenty of clever ways of figuring this out automatically (e.g. companies are gathering video to do pointcloud mapping, some are using pricey LIDAR, and of course use of GPS.. correlating with optical flow) but would enough simple labelled data let you get from an image to a judgement on potential path - or at least give something for testing other systems I wonder if it would be feasible to categorise layouts with a finite set of options, or would all the permutations of more complex junction types just get out of hand

(I think as judgements of the whole image, these could be done at quite a low resolution? .. and whilst a completely general net might be difficult, could you in principle train nets that work in local regions, and an system just downloads ones appropriate to where it is, from some big database..)

(perhaps these would suit diagrammatic/iconic labels, easier to figure out the difference between whats meant by 'bend'/'turn' etc)

t-junction

screen shot 2018-04-03 at 23 44 56

right turn

screen shot 2018-04-03 at 23 44 34

road bends to right+left_turn

screen shot 2018-04-03 at 23 44 15

straight ahead

screen shot 2018-04-03 at 23 43 50 maybe one could draw in the potential 'targets' to drive though (the distance for straight ahead, boxes around the parts you could drive through,..) I start to think about a video markup tool for this (i.e. if you could draw potential paths, centrelines/stoplines etc and have it interpolate as you scroll through..) with videos of on-road footage there might be a way of mapping out the drivable area through accumulation of actual motion , e.g. look N seconds ahead to find the patch of road that would end up ahead of you, and trace it's path back, .. I guess this would be much clearer with some graphical markup, like gamedev AI splines/nav-mesh superimposed on the actual image, rather than a text-label .. the 'labelme' idea of polygonal markup of the road is ok, but a 'nav-mesh' would have internal edges that represent logical topology i.e. "this is a stop line", "this is a road centreline", as well as informing the drivable path a bit better. Perhaps I should resurrect my javascript tool to do that (it had feature-crept into a mesh editor rather than a simple boundary markup.. perhaps this would be a use; similarly if it was extended to handle video i.e. vertices placed in time maybe you could do some interesting things with interpolation ..)
dobkeratops commented 6 years ago

more stills for which a 'road' label would be nice (actually these also remind me to suggest a shadow label, incase a detector starts finding people/bicycles etc from silhouettes.)

screen shot 2018-04-05 at 11 44 47 screen shot 2018-04-05 at 11 45 26 screen shot 2018-04-05 at 11 36 50
bbernhard commented 6 years ago

I wonder if there would be any value (r.e. self-driving car experiments) in road layout markup, i.e. the seemingly 'boring' empty road images could actually serve a purpose.

Definitely! I am pretty interested in the self driving car experiments that big companies are running at the moment. A thing that's bothering me however (besides the fact that some companies are rushing to get self driving cars out there as soon as possible, no matter if the technology is already performing well), is the fact that the training data is closed source. So there is no way to verify how they trained their systems. So an open source attempt would definitely something desireable.

But I think it might be worth to think a bit more about possible annotation methods and the storage of the data. I guess that the use case "self driving car" might require us to come up with different annotation methods (you already mentioned a video based annotation tool), as well as a different way of representing the data (in the database)? e.q when annotating objects we usually just mark some pixels in the image. but in order to create meaningful data for a self driving dataset we probably need a way to add semantic information as well (e.q "that bent line represents the path of the road and is not meant to be interpreted as the road itself").

Some more (random) thoughts:

I guess we probably could also do that with the current capabilities, but I think it might be worth to invest some time in order to get the full potential out of it.

bbernhard commented 6 years ago

Perhaps I should resurrect my javascript tool to do that (it had feature-crept into a mesh editor rather than a simple boundary markup.. perhaps this would be a use; similarly if it was extended to handle video i.e. vertices placed in time maybe you could do some interesting things with interpolation ..)

That would be great! As you already mentioned, a browser based solution would be in any case superior to a desktop app.

btw: haven't tried it yet, but this sounds also interesting http://carlvondrick.com/vatic/ However it looks like it's not maintained anymore.

dobkeratops commented 6 years ago

. but in order to create meaningful data for a self driving dataset we probably need a way to add semantic information as well

yes I'm beginning to think that too. What I started out trying to do in that tool was labels on points, edges, or polygons; it was kind of confused between 'objects' and 'components' in the UI & datastructure. Figuring out what to store, and how to present it will take some experimentation.

(e.g. a labelled point is just an object with one vertex.. ok so the label is still 'an object')

I suppose placing points in the centre of the road would begin to allow orientating the cameras between frames (i.e. before you've meticulously clicked out the whole road boundary). points which connect to multiple lines would be de-facto junctions, I guess, but you would still want to mark more e.g. 'where to give way' etc.

a browser based solution would be in any case superior to a desktop app.

right.. and whilst we might not be able to scroll through video, there's still web-gl. It would be nice if we could make something that gradually filled out a 3d map that you could fly around - that would make the activity spatially engaging