question.. would these bulk cropped examples be ok to upload?

another workflow: taking crops (mac screenshot-region hotkey) from video (scrolling through the player) it's easy to produce 'annotations' (as single-object crops) quickly; you've already got whole stills from the same videos.

However before I bulk-upload them with your script I'd better check - you raise concerns about privacy, e.g. images focussing on specific people and license plates.

Maybe these would be ok if limited in resolution such that you can't make out the text or faces? (32x32 pixel area limit might do it I guess? - i'd consider aspect ratio instead of just squashing to a square, e.g. allow 16x64 or 64x16 etc)

Perhaps some additional hints could be given (single object examples vs scenes) - it might be more productive to ask people to annotate pieces like 'head' , 'wheel' rather than draw bounding boxes around 'a car in an image which is 75% car already' (EDIT: ok, actually you could still ask for polygon boundaries rather than bounding boxes.. maybe you could rotate the image randomly so that attempts at rectangular annotation produce additional edges, if people dont tend to use the poly-tool)

(given how easy it is to do these, I also wonder if uploading the complete street scenes as just 'road'/'building' to complement these might also be best)

'car'

'person'

bicycle (new label suggestion)

tree

Wow, that's really great!

regarding privacy: I think that could be indeed a problem. Not really sure about, but I guess other image hosting platforms (like flickr) might have a similar problem. I am not a lawyer, but I would assume that I cannot be held liable if somebody would upload a photo that violates someones privacy. But there is always the possibility that someone reports a privacy violation and in that case I have to take the image down within 24hrs. So, I guess in order to avoid that we validate/annotate something that might be taken down at some point due to a privacy violation, we probably should avoid that in the first place, if possible.

regarding scaling down: that's an interesting idea and would probably solve the privacy concerns. The problem I see with that approach is, that we probably can't annotate those images properly anymore, which would "break" the current workflow. At the moment, each image label automatically will be available for annotation, if enough users accepted the label.

Ideally I would like to see faces and license plates blurred/blacked out. I am not sure if it's worth the effort, but I could imagine a dedicated API endpoint for that, where you could upload your picture and it returns you the blurred/blacked out version. For faces and license plates I could imagine that it works pretty well, as the characteristics of those are pretty prominent. btw: here is something interesting: https://github.com/openalpr/openalpr

I think with the recent Facebook privacy leaks, we have to be really careful and repsonible with the data we collect here. One thing I also thought about lately, is the geoposition. As you are driving around with your bike, it would be a great opportunity to get geo-tagged pictures. But if we collect the data without any obfuscation, it will become a privacy nightmare...

btw: thanks also for the other tickets/discussion/ideas. I am short on time today, but I'll get back to you in the next few days. :)

Yeah I did flood a load of thoughts quickly.( I wish GitHub would let me prioritise them.)

r.e. scaling/privacy - what is the path of minimum resistance and risk? - I suppose I could scale to ~64x64pixels, thats somewhere in the middle. Thats still 8x8 x (8x8 pixels) and as such you could still identify components, seperate background etc. At the expense of slightly more work on the platform (I do know how many small suggestions add up) perhaps it could not bother asking for annotations if the image size is below some threshold - but you could still use it for further label refinement ("what kind of car.."). I suppose a refinement mode could also present a grid of thumbnails and ask 'select the SUVs' etc

As you are driving around with your bike, it would be a great opportunity to get geo-tagged pictures.

Yeah I dont have a GPS right now but that's definitely of interest. As a halfway house a 'continent/country/city' image-wide label might at least give yet more potential training signals (types of plant, car,building would give hints as to overall geographic location)

more examples - parts:- 3000 annotations approximately done this way so far

headlight_of_car

taillight_of_car

wheel_of_car

wingmirror_of_car

foot_of_person

hand_of_person

head_of_person << possibly problematic for privacy .. use lowest res?

I've uploaded these files into a git repo (should just about fit..) - roughly 3000 'annotations' (not really qualifying as 'images')

car' category https://github.com/dobkeratops/data/tree/master/anything_with_wheels/motor_vehicle 'person' https://github.com/dobkeratops/data/tree/master/person whole thing https://github.com/dobkeratops/data '

I divided the directories a bit further with orientations and parts ('car/back_of_car/..' etc); if you have time, see what you think r.e. what label structure would suit you etc. if you flattened it you'd get back to the simple label structure.

EDIT: i've made a few more categories and tried to setup the least bad 'label tree' that I could through directories, but it's person/*/*.png and anything_with_wheels/car/*/*.png that has the most by far. (i moved 'car' into a subdir to try and keep the top level at just 10)

if you're worried about repetition:- The parts (wheel, etc) are all seperate samples from the original video, (and indeed the crops themselves are from different frames to the stills I uploaded) not crops from these crops; they also tend to be bigger, to avoid privacy hazards I was looking for more distant examples of complete people.

general observation

Annotating by scrolling through video is more engaging than annotating stills - because you move through the scene 'finding' the next object of interest. (and if you actually got frame-frame coherence, you'd get more samples?) I wonder if annotating 360 video could be even more so? (I think you talked about AR stuff so that's basically along the same lines) Perhaps the interface could become a bit like FPS controls (imagine instead of drawing bounding boxes, zooming in & out like a sniper, aligning a fixed size reticle)

another idea on privacy so showing a distant person is ok.. focusing on hi-res face not - but what about components of faces (which could be photo fitted to make generic blends..) tried grabbing 'ears', seemed doable..

Wow, that's really nice - very much appreciated! Out of interest: How long did it take you to create all those crops?

I wonder if it would be possible to use those cropped images to find the appropriate bounding rectangle within the video stream? I think those image crops are good for some specific use cases, but I guess we would be more flexible if we would store the image in full size together with the bounding rect coordinates. I think that would cover more use cases.

But I also understand that your annotation method is more fun (and probably also faster?) than the usual annotation process. So maybe we can "combine" both methods and create the bounding rectangle coordinates from the image crop by doing a frame to frame comparison with the video?

When using the image in full size, we would however run into privacy issues (persons face, license plate), but maybe there is an automated way to blurry those out with a decent false/positive rate? At the moment I am a bit worried, that people could flag images, even if it's not their privacy that is violated - just because they can. In that case I have to take down the image, in order to avoid any legal troubles. Cropping the images like you did would definitely solve the problem, but we would lose quite a lot of information. (I guess the objects position within a scene could also be an interesting information when training a neural net?).

I do not have any experience in regards to an automated license plate/person's face detection mechanism, but if there is a way to realize that in a (semi-)automated way with a decent false/positive rate, I think that's something we should strive for, even if that takes us a bit to implement. I think that could pay off in the long run.

How long did it take you to create all those crops?

Hard to measure exactly but over 'a few afternoons' I've ended up with 5000 total 'samples' divided between car/person/other objects/and parts. I'd say 'a few hundred per sitting'.

I'm pausing now and wanting a change, e.g. ultimately I know the rectangles associated with images are better - for example, with occlusion/overlap I did some examples 'car_behind_car' etc but you get those automatically from labelled scenes.

I'm not sure the component crops would suit uploading to your site , but maybe the car examples are useful.

(my latest experiment was to mount 1 camera sideways; i've uploaded some car/person still to your site from that, those images were of slightly different character.. you get more angles on the same objects that you ride past, so now I see panoramic footage would be useful.. I should try and make a mount for all those r-pi cameras I have sat here )

I guess we would be more flexible if we would store the image in full size together with the bounding

exactly. the annotated images are superior data - more information between each, and efficient storage. So this method isn't a magic bullet .. it's just something that's easy to do.

But I also understand that your annotation method is more fun (and probably also faster?)

yes. I think it's due to having a moving image, and the ability to search to something worth labelling interactively.

"I wonder if it would be possible to use those cropped images to find the appropriate bounding rectangle within the video stream?"

Yes I've got that in the back of my mind - I'm keeping the original videos and half hoping I can go back and do exactly that, which would then yield yet more frames of interpolation around the originals. However, (see ideas on 'road layout' #79), I'm thinking a real video annotation tool with time-aware annotations would be superior in every way.. if you could draw 2 (x0..x1,y..y1,t) bounding boxes say 2 seconds apart you'd have the animated shape between yielding many seperate training samples.. and of course motion vector information.

The other thing: it's easy to get loads of 'car'/'person' because the images have many examples, but once you have to search for a label that gets a bit trickier .. what I tended to do was focus on one label at a time then scroll through, which lets me just move the files together into a directory. it's still possible to do a 'binary split' between images in a directory reasonably efficiently (e.g. given a directory of 'car', multi-select examples of 'front' and move into another subdir).. but all that could be easier in a dedicated tool (back to the idea of hotkeys for navigating and of course your quiz idea presented in the web-interface)

I'm not sure if a video-based tool would work in a web page so well (they can certainly play video but locally you can scroll through in an analog way via the trackpad..). Perhaps a web-page could still deliver the 'search' aspect by showing an image + thumbnails of future/past on a timeline below to step through (e.g. +/- 0.1, 0.2, 0.5, 1, 5 seconds..). I'm sold on the idea of web-based tools being far better for public participation (no faffing around with installs)

'side view' stills

frame250

frame496 frame465

frame326

latest experiment, messing with 'GIMP' to give the crops (rough..) alpha channels (like making sprites) - imagine taking the alpha channel of submissions as the given label..

These images still have the surroundings they were extracted from under the surrounding alpha=0 area. it was much slower to do these of course.. a geometric tool could be much faster, it's just it seems the incremental approach of painting it is more relaxing (GIMP probably has better ways of doing it that I haven't discovered yet aswell)

screen shot 2018-04-08 at 02 19 55

I still wonder if a neural net could figure out the alpha channel just from enough outlines ('what is common between them..), or if doing manual outlines, I wonder how many would it take to be cable of doing them automatically..

ImageMonkey / imagemonkey-core

question.. would these bulk cropped examples be ok to upload? #77