Pure component and material labels - suggestion

ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.

https://imagemonkey.io

47 stars 10 forks source link

Pure component and material labels - suggestion #264

Open dobkeratops opened 4 years ago

dobkeratops commented 4 years ago

Imagine if it allowed part and material names (wheel head metal wood etc) as standalone labels - without requiring an object :

if you can browse free labelling you will find some examples I’ve done like this already , to see what it looks like (search for wheel , head , hand ..)
it decreases the amount of typing to make a label, or list navigation to select a label ( eg currently I think the full list must show every combination of component and object). in the tablet+pen use case (drawing outlines is much faster but typing is a bit slower) this has been extra comfortable. Even when using a keyboard, writing out “wheel of car, headlight of car, car, windscreen of car...” means a lot of repetition of typing car
it would allow annotating the parts of objects we don’t yet have labels for, eg obscure types of vehicle or device (I’ve seen a cleaning bucket on wheels, “Tricycles” share components with “bicycles”, or there are many types of animal yet to include).
I would still suggest allowing the specific combinations aswell
regarding training , there are would be parallels between the component of many distinct objects - eg similarity between “wing of aircraft” and “wing of bird”, “fin of fish”, “fin of rocket” ; whilst I would still say “specific object components” are more important , I think you could still train a detector for standalone part labels
sometimes those comments are visible in isolation, either detached , or with the rest of the object invisible
regarding materials, some objects are comprised of multiple materials , eg a tool with a metal blade and wooden handle , a box with a metal frame and wooden panels, a plastic bucket with a metal wire handle ; buildings with stone or wooden walls, and tiles or thatched roofs , etc
again with materials, sometimes you can see components , bits of debris etc without knowing what they are

Layering could be used to retroactively add the “object name” information , or possibly (harder) a hierarchy feature (like label me) - eg imagine if you could select several polygons and combine them to make the patent shape . I’ll say when annotating with a pen , doing an extra polygon for the whole object is usually fast enough already . I would suggest either (i)trying this at the pixel level (ie “man,woman,car, truck” are distinct image, channels from “wheel, hand, head, headlight” etc - so they all just get painted in allowing overal - and the existing specific part labels just activate both object and part channels) or (ii) assigning an object to the part based on greatest polygon overlap (harder, and you’d probably have to code the pixel overlay first anyway to figure this out). Objects might overlap already so I suspect we’d have to use a many channel image anyway .. there would be various ways to handle this reasonably efficiently) 6ADB8B0A-CBBD-4D56-B797-1A245A59B1E3

Some suggestions for part labels: wheel,tire,hub, axel,lid,handle (many hand held objects , also doors),hole, saddle, seat, roof head,tail,neck,arm,leg, foreleg , hindleg (some animals .. “foreleg” is transition between quadruped and biped), wing, fin, flipper (divers and some animals), tailfin ( best name for the rear fins of an aircraft vs calling it tail - tailfin of fish , tailfin of aeroplane ..)

Some labels could be a part or genuinely standalone object: wall

dobkeratops commented 4 years ago

there is of course problem with some words being ambiguous between part or object or material: fork, glass are two examples that spring to mind Maybe combinations or aliases could disambiguate? Making specific aliases would be safest. (There might be ways to guess based on overlap and context , but that would be more complex to code and would need testing )

fork: “table fork” - cutlery “pitch fork” - tool “fork of road” might be useful for road layouts , imagine a conversation between Intelligent taxi and passenger .. “turn left at the next fork in the road..” “fork of bicycle” Or “bicycle front fork” “rigid bicycle fork” “suspension fork”.. A bicycle component holding the front wheel

glass: .. possible hints?

glass material ? glass cup glassware (Broad label covering Jars jugs etc just made of glass) glass of water (must be a cup) empty glass (must be a cup) glass panel glass tabletop glass window glass building glass shards

bbernhard commented 4 years ago

interesting ideas - thanks for bringing those up!

I think we could indeed promote component labels to "normal" labels.I have to check the backend code, but I think there are no restrictions that would prevent that. I'll have a look at that later this week.

The only downside I can think of is, that we might end up with a few duplicates. e.g someone adds a wheel/car label and the next one a wheel label, both annotating the same object. But I guess that's something we can only solve with moderation, no? (I think it's a bit similar to content moderation on Wikipedia. While there are tools in place that try to detect spam, abusive behavior and duplicates there's still a significant amount of manual moderation work needed for house cleaning.)

I also like your layering/grouping idea. Together with the browse based mode, I think we could later easily add the possibility to rearrange component labels bulkwise. e.g: imagine a browse based view where you can search for component labels (e.g: wheel) and it returns you all the existing polygons for that label and the images they belong to. By scrolling through the images you can e.g select all the wheel polygons that belong to a car and click: "add to parent label ". This transitions the wheel annotations to wheel/car annotations.

dobkeratops commented 4 years ago

Right that might take some fine tuning regarding label sorting and visibility while annotating .. maybe there could be some assist regarding how the current label overlaps ( eg if you selected “wheel” perhaps it could show all other wheel related labels (wheel of car etc) 50% transparency as a guide. The same sort of thing might help with person,man, woman

dobkeratops commented 4 years ago

Just curious : i saw some more uploads up on the activity chart (90k now.. looks like 100k images is within reach..) Is that you , or another contributor? Are they more scrapes or original photos? ( I realise the usual sources would have kept getting updates)

I think I’m seeing some of them in unlabelled image searches (ie images i don’t think I can remember uploading or seeing in my searches), which is great .

It might be nice to have a search option for “recent uploads” (eg here I’m curious to browse these new images) , .. but the best default order is open to question . Pure random stops it lingering on one class of image , but sometimes if you personally upload something you’d want to annotate it whilst it’s fresh in your mind

As it stands the default seems to work ok (ie I do see a mix of old and new)

bbernhard commented 4 years ago

Just curious : i saw some more uploads up on the activity chart (90k now.. looks like 100k images is within reach..)

yeah, that was mostly me. The last few days I was scraping flickr for CC0 licensed images. It's a big mixture of stock photos, panorama photos, still photos, etc. Due to the inbuilt duplicate image detection, only new images should have made it through the detector (when I bunch upload scraped images from flickr a significant portion of the images are rejected as a duplicate, so I think the duplicate detection should be working fine :))

It might be nice to have a search option for “recent uploads” (eg here I’m curious to browse these new images)

good idea, I'll put that on my todo list :). Thanks!

My new year's resolution regarding ImageMonkey is to be more active in relevant communities in order to attract more contributors. We've made a lot of progress over the last two years and I think we've now reached a point featurewise where it makes sense to reach out to the community again for some help.

I recently also added the possibility to support the project financially (either via Paypal or via Patreon). One big (far fetched) goal is to experiment a bit with micro payments. e.g users who like the service, but lack the time to contribute to the dataset, can support the project financially. The collected money is then used to reward (power-)users via micro payments. But yeah, I guess the whole ML/data collecting sector is probably only interesting for a small niche...so not sure if we will ever get more than maybe a handful of backers.

dobkeratops commented 4 years ago

Ah yes I noticed that (the option to donate). It could be kind of like open source bounties. We just need to get the message out there and find ways to connect with other projects and services. There are so many potential uses for a truly open source image database. I contribute because I can imagine getting some of these done myself in the future, and I see the big picture ( if we want “democratised” robots, we need democratised training data) but people who aren’t developers probably want closer tangible benefits. “Interesting to a small niche” - a small niche would see the use and contribute, but ML and hence training data is relevant to almost every activity. Food, transport,entertainment,medicine..

bbernhard commented 4 years ago

I finally motivated myself to write another blog post: https://imagemonkey.io/blog/general/2020/02/09/ImageMonkey-100k-0.html (I am not a good writer, so I hope it's not too bad).

As already mentioned in the article, thanks a lot for your help @dobkeratops.

dobkeratops commented 4 years ago

nice summary. I just started getting back into a graphics project. I'm keeping in mind the long term goal: AI assists for 3d graphics pipelines.. but there's a big gap between any current work. Many steps remain. I'm hoping that imagemonkey will be a suitable resource .

I'm sure you've seen those impressive nvidia demos (painting with named textures, even rendering environments).. there is clearly scope for more

I do have a reasonale spare GPU lying around that I could leave training (gtx1080) - I've never actually got much done with AI down to the training times but I note you talk about the shortcut of using a pre-trained nets.

I also note reading your summary that you have training integrated with the site, that seems rather useful..

I think there are some tricks like over-estimating whats needed then cutting net subsets out afterward.

I've got a few other ideas midway between graphics programming and datalabelling.. like labelling with a library of aproximate peices of geometry. But I'll pause before dumping all that, and get some examples together

bbernhard commented 4 years ago

thanks!

That sounds really interesting - can't wait to see some example :)

Yeah, I've been playing a bit more with AI + neural nets recently. Personally I find neural nets really interesting, but for me it's also one of those topics that burns me out pretty quickly (as it's quite challenging and sometimes pretty frustrating). So, I've a bit of an love/hate relationship with machine learning :D

The gtx1080 is pretty cool - I am using the same model on my rented bare metal server.

For me the most frustrating part with machine learning has always been the instability of the machine learning frameworks. More than once my code broke because Google changed something in Tensorflow. That was mainly the reason why I've created the ML ImageMonkey docker container for training. With the container it's possible to spin up the exact same training environment without worrying that some updated package breaks everything. That makes the whole machine learning stuff much more fun.

dobkeratops commented 4 years ago

its just an engine for my own amusement. not going to compete with unreal/unity. Got it flying around a landscape at the moment. I've bounced in recent years between Rust and C++ - this year i've gone back to c++ where I can get things done quicker.. I might end up trying to sync parts of it - coming back from rust my C++ style changes. Sadly whilst I like all the ideas in rust, the better IDEs and familiarity still make me 4x as productive in C++ ( if its a personal project where I'm using my own prefered subset.. everyone has their own)

At the same time i'm trying to improvise some "programmer graphics" in blender,gimp, as I dont currently have a professional artist to work with. Thats where the interest in neural nets comes in. The sky is the limit but it's also hard to get something useful done (with the long training times).

I've got my spare PC back up - I could put my bigger GPU in that and leave it training in the backgropund

"big goals"..

the simplest application of nets would be guessing material assignments (things like "rusty metal", "brushed metal", "plastic", "stone","brick", "gravel",..) .. slapping microtextures + appropriate specular/reflection etc onto things. I've seen some people generate normal maps,displacement maps.

..but the more difficult challenge is the assist of building textured art, which could come from 2 angles:-

(i) assists for texturing simple lowpoly models .. could be enhanced by recognising what the things approximately look like (brings us back to the question of recognising things like "(toy) car", "(cartoon drawing of) fish",..) - (ii) or the exact opposite way round: assists for figuring out the geometry behind a photo.

I wonder if we could mix low-poly (and cartoon/toy representations) art with labelling?.. "this lowpoly/cartoon/toy is supposed to represent this photograph". almost like a form of visual label. "the closest lowpoly model to this photographed object is ..."

A bit of a moonshot however ... with a few 100k's-millions of examples, plus some as-yet unknown variation of GANs (and wavefunction collapse synthesis?) -> an AI art assist that can improve a novices attempts..

one side note , i just had a bash at implementing delauney triangulation (there's probably JS libs at hand for that) and remembered the idea of 'point labels' (imagine an intermediate between just labelling and full annotation where you just give a few example points, then a 1st guess is to just triangulate and interpolate label probability between the keypoints). That could be really intuitive for casual users?

dobkeratops commented 4 years ago

84B4E7E9-D6D4-42AB-851A-95F05D5F1DFA

bbernhard commented 4 years ago

Thanks for the heads up - this project of yours sounds really interesting! Can't wait to see the first prototype in action :)

Thats where the interest in neural nets comes in. The sky is the limit but it's also hard to get something useful done (with the long training times).

I am wondering if you can iterate faster, by focusing on a really small dataset first and then ramp up the dataset size, once you decent results with the small dataset.

on a related note: I am currently in the middle of fully automating the training of neural nets. It basically works like this:

the necessary data is downloaded from ImageMonkey
the neural net will be trained on the data
after the training is finished, the model will be automatically uploaded to a public Github repo

The whole thing runs headless without any user interaction on a regular basis. So every x days a new trained model will be uploaded to github. As this runs automatically without any user interaction it's quite hard to tell whether a neural net has improved between training runs or has become worse.

In order to track the health of the model, I've implemented two additional mechanisms (which I find quite useful):

after every training run, tensorboard gets started and a screenshot of the models parameters will be created automatically. (e.g: https://github.com/ImageMonkey/imagemonkey-models-test/blob/master/image-classification/2020-02-22%2016:20/graphs.png). I am using this project for creating the tensorboard screenshots.
after the model is trained, I am running some test data through and automatically create screenshots from the models output. (e.g: I am feeding the neural net a picture of dog and expect it to also identify it as dog. If the model identifies it with 99,9% probability as dog two weeks ago and now the probability is only 80%, something has gotten worse). Going through the visual output really helps to get a feeling on what type of images the neural net is struggling with.

I am a novice when it comes to machine learning, and I am pretty sure there are better techniques out there, but those two actions really helped me to get a better understanding of the neural net and helped to make the training results somehow comparable.

one side note , i just had a bash at implementing delauney triangulation (there's probably JS libs at hand for that) and remembered the idea of 'point labels' (imagine an intermediate between just labelling and full annotation where you just give a few example points, then a 1st guess is to just triangulate and interpolate label probability between the keypoints). That could be really intuitive for casual users?

Haven't heard about delauney triangulation before, but will definitely look that up. Thanks for sharing! :)

Out of interest: Which ML framework are you using? Haven't done anything ML related in C++, so I am really curious about it :)

dobkeratops commented 4 years ago

The automated training sounds great .. I suppose users could eventually configure a label set to train on through a UI (maybe overkill) . That does sound like an extra dimension to the site.. I’ve heard other talk of “democratising machine learning” in a similar way, UIs for casual users much like we have artist and designer tools that can do some impressive things without writing code. Perhaps the observed errors on the trained nets could be used to suggest tasks (funding difficult examples)

Last time I actually tried anything I implemented convnets myself in OpenCL but I didn’t do any thing useful with them .. I think I would just pickup tensorflow. There is one more itch for something custom - I suspect there is overlap between capsule nets (“routing by agreement”) and the texture synthesis technique called “wavefunction collapse” (selecting potential patterns based on overlap ie agreement.. Ie there might be a technique between GANs and these other ideas waiting to be discovered

dobkeratops commented 4 years ago

soft_focus this is my current renderer. heightfield + some random objects. As I mentioned most people say its pointless to write one because you can just get unreal or unity.. they have a huge feature set refined by a team of experts for 10+ years, cheap because there are so many customers. unfortunately its so much fun to write, so i'll go full NIH.

So I'm looking for ways to connect "writing a renderer" with AI. could generate procedural meshes, spit out the intermediate channels (normals,depth correlated with their generated image) to train a net to get a sense of 3d, whatever. (i gather this is called "domain randomization")

I'm a looong way from being able to do any of the things I mentioned (AI texturing assist etc). At the same time i'm trying to learn a bit more 3d modelling (usually I just did code and professional artists built everything)

you probably know with the tensor cores in the latest nvidia machine there's ways to use AI directly in rendering (AI denoisers for raytracing, AI materials.. you could do an expensive calculation offline, then train a net to recover the end result from the input channels)

bbernhard commented 4 years ago

WOW, that looks really cool!

As I mentioned most people say its pointless to write one because you can just get unreal or unity.. they have a huge feature set refined by a team of experts for 10+ years, cheap because there are so many customers. unfortunately its so much fun to write, so i'll go full NIH.

I can definitely relate to that. ;-) I think it's a great idea to connect AI with writing a renderer. That way you to get the best of both worlds. I am already looking forward to the point where AI meets renderer and you put that all together - really excited to see how that all works out. :D Great work!

dobkeratops commented 4 years ago

another screenshot, added water.

the random objects have random textures stuffed on them. i was thinking of using some general 'common' shapes (bits of furniture, architectural elements, car outlines, barrels..) with random textures (brick, stone,gravel..) and light -> -> then train a net to infer (shape_index, orientation, texture_index) - but would it have any synergy with real images? I would at least need the objects to have 2-3 peices of surface sepration (eg car: body, winscreen,wheels); a vision system needs to know when things with different textured parts are connected as one whole

does training to recognise a cartoon fish help recognise a real fish? how much do you get from the outline alone.

in building the cartoon or lowpoly representation we've somehow done some work distilling the essence of the salient features, but there's still a huge amount of complexity in extracting that from the real world. (and a game renderer does handle some of the light and shadowing , but my own renderer doesn't have all the global illumination of the latest engines)

I was wondering if filters applied to real photos (npr, toon shaders etc) could put them at the 'same level' as artificial objects from the POV of a NN.

s latest omething else to experiment with would be 'CSG' boolean operations (cut/union etc) - perhaps there'd be value in getting a net to figure out how objects can be assembled from components

I know some people are using driving game engines to train SDCs - but these usually have very expensive artwork (large teams of 3D artists painstakingly building textured meshes)

I haven't got an RTX card yet. I figured if I get something non-trivia done I might treat myself to an upgrade. the latest raytracing tech opens doors for more realistic lighting which would help trying to get synthesised images more like the real world

bbernhard commented 4 years ago

Looks really cool - thanks for the update!

does training to recognise a cartoon fish help recognise a real fish? how much do you get from the outline alone. I was wondering if filters applied to real photos (npr, toon shaders etc) could put them at the 'same level' as artificial objects from the POV of a NN.

that's indeed a good question. I guess that would be something worth trying out. Especially your idea of using a toon shader sounds very interesting. Would really love to see how/if that works.

Another idea would probably be to collect some images from games, annotate the objects in there and close the gap this way. But I guess it would be a pretty time consuming task to collect a decent amount of game images, so not sure if this approach really scales?

I haven't got an RTX card yet. I figured if I get something non-trivia done I might treat myself to an upgrade. the latest raytracing tech opens doors for more realistic lighting which would help trying to get synthesised images more like the real world

Out of interest: Which one are you interested in buying? (Haven't followed the graphics card market in years, so no clue which model is currently the best one out there for that task.)

dobkeratops commented 4 years ago

But I guess it would be a pretty time consuming task to collect a decent amount of game images, so not sure if this approach really scales?

right.. if manually annotating, I guess we still get more from real photos (despite the potential for closing the gap with a continuum of real to synthetic data). but if we can get at the source for existing games (maybe modifying emulators or something..) perhaps there'd be a way to auto-label (although 3d games are quite complex internally, its probably quite hard to pin down where the distinct meshes are and how they're submitted).

There is of course 3d content available online , some free models and some purchasable.. all the 3d printing websiets. Some meshes are ripped from old games. By the PS2 era (early 2000s.. shocking to me to think thats retrogame territory now!) the 3d models were quite sophisticated.

There's a need for matching photos to 'handbuilt' meshes. Raw photogrametry scans work , but they're very inefficient compared to meshes that have been fine tuned by a human artist who knows how to distribute detail around the salient features. (in turn such a database could be used to make a more efficient scanner)

Out of interest: Which one are you interested in buying? (Haven't followed the graphics card market in years, so no clue which model is currently the best one out there for that task.)

the midrange 2070. it's a little beyond the 1080. of course the best for the task is the 2080ti .. but the midrange cards usually have the best price:performance. Having a daily card and a spare for training is more useful overall..

dobkeratops commented 4 years ago

minor news - the new GPU is on the way (EDIT: turned up). So. with the 1080 as a spare and hopefully withni 24 hours the 2070 as main, and a spare old PC beside me capable of at least driving one .. my excuses for NOT doing AI experiments are dwindling :)

Trying to juggle my focus I started looking for starting points.. I can offset my NIH on graphics with code re-use for training....it occured to me you've got this service for automatically training on your data from imagemonkey already..

So lets look at options?

If I make my engine spit out pairs of images/arrays - 'inputs' being random rendered objects (+random textures, random lights) , 'outputs' being {{object ID, texture ID} /corresponding to labels & material properties in your database*/, object orientation, depth maps} - perhaps I can make use of your code to do the training?

What would I need regarding setting up an instance. Whats involved in 'deploying' your code.
What would be the best way to format that data, i.e. matching what you already expect from the human labelled data. Do you turn the drawn outlines into pixel assignment? is there some existing pipeline for taking boundingboxes/polys, can it by bypassed with a raw image? (EDIT: whilst i've been keen on boundaries on contextual photos, I would still generate centred images from a renderer- a rectangle that really should predominantly be 'car','house' etc. context for the random images is less meaningful.. its artificial. So it could be as simple as {image->label} pairs)
is your code in a state where it would be useable like this (I would be nervous about handing someone else my renderer right now, although I am trying to package it up for someone else already. It was actually someone else asking for engine help with an AGI project - another case of 'training AI on games' that spurred me to write this, but his focus is very different)
is there a way a bunch of procedural training pairs and/or a net trained on it can contribute to (i) the 'activity chart' - re-assuring other potential users that this project is active and worth joining and (ii) the overall 'level of visual intelligence' stored in the imagemonkey database.
would it be useful to validate the synthesiser by getting a human opinion on what it's examples actually look like,within the same label vocabulary as the photos? (throw some synthetic examples into the database, perhaps grouped into a Collection so when you judge one a fraction of that assignment goes to all ?)
would this all just be overkill, or is there food for thought in future modifications. What would be the best way to package up a renderer for AI experimenters? you could look at additional features in your pipeline for plugging in procedural data?
- would there be mileage in training using on the fly synthesis or should we just submit a directory - 100k images could be a 10gb dataset, 1million images starts to look prohibabive for throwing data around, whereas slotting in some code to generate them using openGL is very compact. (But what about security) Generting 1million images is 4hours for a 60fps renderer - probably a less GPU overhead than the actual AI training, but still a factor to consider.

failing that I'm sure I can find something else lying around - but you know my big picture interest here. getting some overlap going between different opensource AI efforts would be awesome. I've always thought this whole idea has big potential with many future uses.

(* my hope is the texture can still drive material recognition, even if its on the wrong object. So: if it can spot "a brick texture, on a truck", "a gravel/soil texture, on a table", .. those random samples are still contributing knowledge about table shapes, truck shapes, and those surfaces. A human can make those distinctions - and it might save it from making too many assumptions from texture alone.

this is also thinking about how to leverage current my minimal art database.. I dont have an army of modellers to work with - but I can at least seperate the surfaces out into major peices- verhicles split into body, windows, wheels. humanoid mannequins split into head,hands,upper/lower body. Those seperate parts could be re-textured independently.

I should also keep referencing out past discussions on the picture/toy issue. Lowpoly, synthesised images are in the same bracket. I remember your initial take was that the raw labels suffice, you're actually happy to see the picture of a fish labelled as a fish - and indeed my hope for this use case is that something useful falls out of that similarity. I think i've seen a few photoshopped and rendered images in your database already, so you might already have an idea for a 'property' to assign them )

bbernhard commented 4 years ago

minor news - the new GPU is on the way. So. with the 1080 as a spare and hopefully withni 24 hours the 2070 as main, and a spare old PC beside me capable of at least driving one .. my excuses for NOT doing AI experiments are dwindling :)

awesome! Can't wait to hear how the 2070 performs, especially compared to the 1080. (my rented GPU instance still has the 1080).

Trying to juggle my focus I started looking for starting points.. I can offset my NIH on graphics with code re-use for training....it occured to me you've got this service for automatically training on your data from imagemonkey already..

What would I need regarding setting up an instance. Whats involved in 'deploying' your code.

I've recently written a short blog post about it on Medium: https://medium.com/@imagemonkey/in-this-short-blog-post-i-would-like-to-show-you-how-to-export-data-from-the-imagemonkey-dataset-68ea5cc171a (I am experimenting a bit with new social platforms like Medium and Twitter in order to spread the news about ImageMonkey). The above blog post briefly describes how an image classifier can be trained via transfer learning using ImageMonkey's dataset.

Besides image classification the monkey script is also capable of doing object detection (via tensorflow) and object segementation using MaskRCNN. The different training modes can be selected via the --type parameter. (--type="image-classification", --type="object-detection", --type="object-segmentation").

In case you give it a try and it fails, please let me know (that's most probably a bug). The whole monkey script is written in Python, so it should be (fairly) easy extendable. In case you want to extend it, here's the code As google doesn't care much about backwards compatibility of their software, I had to pin tensorflow and tensorflow-gpu to a specific version in the Dockerfile, otherwise the script was breaking with every new tensorflow release. I haven't given it a try, but maybe the newest tensorflow releases are already mature enough to upgrade (would be great to move from tensorflow 1.x to 2.x).

is there a way a bunch of procedural training pairs and/or a net trained on it can contribute to (i) the 'activity chart' - re-assuring other potential users that this project is active and worth joining and (ii) the overall 'level of visual intelligence' stored in the imagemonkey database.

Something that I've been playing lately a bit is the possibility to periodically train a neural net and then upload the trained model to github - fully automated, without any user input. At the moment I am using this repo here for beta testing. On my GPU instance there's a script running, which starts an image classification training once a day (using the above docker container) and then uploads the model with some statistics afterwards to the github repo. Since 2020-02-27 it seems to be running quite smoothly :)

I am still working on the code, so it's not yet in a state where I comfortable sharing it with others. But once I am done, it should run on any system (the only requirements are an internet connection and a docker daemon).

Maybe we can go a similar direction with the stuff you are working on? So, that you are also uploading the generated artifacts (whether it is a ML model, images, etc) somewhere (github, s3 bucket, some other storage..etc) and then we are aggregating the information in the activity chart/on the landing page. (e.g displaying a link to the latest model of yours on the landing page; or a models activity chart, etc.)

I think at least in a first iteration it's easier that way, as it gives you the flexibility to break stuff (which happens naturally during development), without worrying that ImageMonkey services are affected too. At a later point, we can decide whether we want a more tightly integration (e.g feeding data directly back into the service/database) or if the lose coupling is maybe even an advantage (we avoid single point of failures). What do you think about that?

would this all just be overkill, or is there food for thought in future modifications. What would be the best way to package up a renderer for AI experimenters? you could look at additional features in your pipeline for plugging in procedural data?

Not sure if you want to go that route, but I am personally a fan of docker images. Docker obviously has some flaws and I am not sure if I would use it for something security critical, but I really like that it's self contained and it can run on almost any OS + architecture (arm, x86-64, i386, etc). I also think that it solves the "it works on my machine" problems quite nicely. And it's really user friendly (usually a docker pull ... followed by a docker run ... is everything you need to run any docker container)

(I have to run some errands. I'll get back to the remaining discussion points later today/tomorrow :))

dobkeratops commented 4 years ago

(Some points from the above ideas dump) BAC2797B-9939-4C1D-BF1A-896CD90E0925

bbernhard commented 4 years ago

(* my hope is the texture can still drive material recognition, even if its on the wrong object. So: if it can spot "a brick texture, on a truck", "a gravel/soil texture, on a table", .. those random samples are still contributing knowledge about table shapes, truck shapes, and those surfaces

totally agreed!

I really like your visual brainstorming approach - your drawings always remind me a bit of Mythbusters (the TV series). They always had this blueprint drawings which sketched out the experiment before they started working on it. Man, I really miss the series..

Wow, your renderer offers quite a lot of possibilities - is there something that you find particular interesting/promising?

Feeding back data to the ImageMonkey database shouldn't be a problem - there already exist REST API endpoints for almost every functionality. A developer friendly library is still missing (I've started working on Javascript and Python libraries, but they are far from complete), but it shouldn't be hard to extend the existing ones (or write a small REST API wrapper in a different language; e.g C++) - I can take care of that :)

Depending on which point you want to implement, there are maybe some database changes needed (e.g to mark the data as "autogenerated" in the database) - but I think we can look at that in detail at a later point.

What would be really great is, if your renderer could output the result artifacts locally (e.g command line) or store it somewhere (e.g in a file, github repository, etc..). That way you could iterate very quickly without worrying that you break something in the ImageMonkey backend. And it gives other users the possibility to fork your project and play with it on their local machine. That would be really awesome!

dobkeratops commented 4 years ago

Right I don’t want to flood your database immediately with 1million rendered examples, hah. I was thinking of throwing up batches of about 1000 examples to see how it goes. We need to establish the label , your suggestion there is “autogenerated”. That could work . I would also add the more specific “low poly” for hand built models in the 100-2000 polygon range ( the kind I have a chance of personally building in blender) . We need to make sure people can easily exclude generated models from searches (And that probably should be the default?) Talking to an artist friend they have recommended splashing buying some meshes, specifically:“evermotion” “archmodels” collections but I’d be looking at €500+ for a decent selection, which has me thinking “buy another GPU for training..”. I’m also not sure about the licensing. I’ll think about that trade off. They do look really good. But it has me wondering how many of these mesh selling websites are already writing this exact system, given their database head start ..

bbernhard commented 4 years ago

Right I don’t want to flood your database immediately with 1million rendered examples, hah. I was thinking of throwing up batches of about 1000 examples to see how it goes. We need to establish the label , your suggestion there is “autogenerated”.

yeah, I would like to tag those generated artifacts somehow. Not sure yet what's the best way to do this...I guess that also depends which one of the above points you want to tackle first.

But I am sure we will find a way to feed the data back. :) As soon as you have a working prototype, please let me know. I'll then have a look at how we can feed the data back and prepare the necessary backend changes.

specifically:“evermotion” “archmodels” collections but I’d be looking at €500+ for a decent selection, which has me thinking “buy another GPU for training..”. I’m also not sure about the licensing.

you are right, they look really good. But as you said, the license could indeed be a problem. My gut feeling is, that we would need something less restrictive (at least Creative Commons 3.0 licensed and even better CC0 (public domain) licensed) in order to get wide adoption. I am also a bit worried about the legal consequences when using commercial models. And I guess this would then also mean, that users would have to buy the same collections when they want to play with your renderer, right?

Unfortunately, I do not know the community much (I played around with Cinema 4D and 3Ds max for about a year when I was ~18 years old), but maybe there are some volunteers out there that would contribute some CC0 licensed models for that purpose? My gut feeling is that once there's something to showcase (even if it's based on some really low poly models), that people will come.

In case you need APIs to fetch data from ImageMonkey, please let me know.

dobkeratops commented 4 years ago

( FYI I got a spare PC running with a spare GPU , initially I'm using it for folding@home as a little response to "current events" .

I was just thinking how with these lockdowns many people should have time on their hands for labelling.. and how suddenly the need for automation, delivery bots etc is more pressing.

If I can figure out a case/cable mismatch issue I'm actually going to have 2 GPUs useable this way . I've actually got a gtx970, gtx1080 , and the rtx2070, and an even older core2 quad machine lying around, which does still work, should surely be enough to just throw work at a GPU. The 970 remains enough for daily work. I'll have to think about the power draw before I go too crazy setting up a farm.

core i7 4790 + gtx970, daily driver             4TFlops
core i7 860 + rtx2070                          8-9 TFlops
core2quad Q6600 <2 potential GPU slots>
                  gtx1080   <awaiting PSU adapter> 8TFlops

I could just timeshare a spare machine between folding and training.

I gather its possible to downclock to reduce strain; I dont think consumer parts are rated to be run continuously like servers, hence the premium that you pay for Quadros etc..)

bbernhard commented 4 years ago

Awesome!

I am still experimenting a bit with neural nets..mostly trying to figure out if we can already do something useful with the collected data we have. Unfortunately, this is pretty time consuming (I am usually kicking off a training job every 2-3 days on my GPU instance) and involves a lot of trial & error.

Up to now, I've only played a bit with image classification (which is rather "simple"), but for me object detection/segmentation is way more interesting. My goal here is to get something done that shows that we are "on the right track"...this can be as simple as a cat/dog object detector. A small little project where we can "show off" a little bit (and which hopefully gets us more traction).

If you want to help with this...any help here is really appreciated! :)

dobkeratops commented 4 years ago

definitely useful .. and I agree its hard becuase the experi.mentation takes so long. can't iterate ideas so fast if it takes 2-3 days to try something out .

as it happens the folding seems to be intermittent - i think they've been swamped with volunteers so the servers aren't always able to give "work units" out. I had it running all night but it's been sat idle all afternoon. Timesharing should go ok. I just need to get this one adapter ordered (6-8pin..) and then i'll have 2 spare GPUs aswell (it would at least let me alternate kicking off 2 experiments in parallel)

So many experiments I would like to run..

Pavement vs Road (for SDCs)
some set of materials, eg. wood vs metal vs plastic .. gravel grass soil sand
recognising human state (lets see how many examples of sitting vs standing vs running vs walking etc we have)
even just training to predict the whole label list, without annotations (eg telling the difference between parks /forests , room types etc)

and sure "cat vs dog" is the classic one but it might be nice to show we're collecting potentially new data. some of the ideas above will require writing some code to parse the labels (and probably writing a bunch of aliases - that shouldn't take too long if focussed on one goal , although doing them all probably would..)

bbernhard commented 4 years ago

as it happens the folding seems to be intermittent - i think they've been swamped with volunteers so the servers aren't always able to give "work units" out. I had it running all night but it's been sat idle all afternoon. Timesharing should go ok. I just need to get this one adapter ordered (6-8pin..) and then i'll have 2 spare GPUs aswell (it would at least let me alternate kicking off 2 experiments in parallel)

Yeah, I think there are currently many people out there contributing their computing power to folding@home. While I find the project awesome, I am not sure if this project can really help us with the current crisis in a timely manner. But nevertheless, it's an awesome project and it's definitely worth contributing to (and maybe we are lucky and it really helps in fighting corona) :)

and sure "cat vs dog" is the classic one but it might be nice to show we're collecting potentially new data.

totally agreed. :)

your list of experiments look really interesting - can't wait to see some progress on those. As we have quite a bit of data collected now, I hope that we already can get something useful out of it :)

dobkeratops commented 4 years ago

Just grabbing the current label suggestions endpoint .. it’s going to take some effort to organise this . Doing it all in one go is probably too daunting . Maybe I can write something to parse some of my more recent conventions, and perhaps manually flag some of the typos in there while I’m at it. Could probably approach that from both ends.. start with everything in a grey list, and gradually write a “white list” of confirmed suggestions and a “black list” of typing errors with their best replacement.. I’ll see how far I get..

dobkeratops commented 4 years ago

these are just a few greps through the label list https://github.com/dobkeratops/imagemonkeylabelextraction .. trying to glance through to check the theory "anything containing man/woman/car/person is reducable to that", i'm looking for counterexamples.. there's also the example uses of "of", "or"

grep for car.. https://github.com/dobkeratops/imagemonkeylabelextraction/blob/master/car.txt Some labels containing "car" that are not a car:-

cable car
car jack - type of tool?
car showroom
car carrying truck - a type of truck with a trailer for carrying cars
multistorey car park
open air car park
- perhaps "open_air/" and "multistorey/" could be properties? (multistorey/building.. most are, a few aren't) Having said that, "multistorey car park" is a type of building, whilst "open air car park" is really a type of surface, or area. not sure what you should do there..
passenger car/train
- this is unfortunate :( "passenger car" could mean either a road car or part of a train. i'd want to swap this label for a more explicit alias i.e. "train passenger car" or "railway passenger car", and add an alias "road passenger car" , and flag "passenger car" as too ambiguous (swap it for the explicit aliases)
car key
- really a "key", not a car .. Given the ambiguity of 'key' , maybe we could have explicit aliases "door key","car keys","lock key","keyboard key","computer keyboard key","piano key","synthesiser keyboard key" , and flag "key" as an ambiguous word (use an explicit alias.)
car park ticket machine - probably wants to be a big alias, not a "part of car park". but we certainly want a "ticket machine" aswell ("machine"->"ticket machine"->"car park ticket machine") shadow of car - there's a few of these.. shadow_of_man, shadow_of_truck,shadow_of_fighter_jet. general pupose label for "shadow", or prefix "shadow_of/" perhaps?. it's definitely a shadow*.

Variations that are definitely a car:-

vintage car - older than 'classic'
classic car - not quite as old as 'vintage'
sports car,sportscar
luxury car
prestige car - might be alias for 'luxury car'
derelict car
parked car
supercar (extreme type of sports car)
convertible sports car
racing car
open wheel racing car
f1 car,formula 1 car
lexus car
porsche sports car
hatchback car
fastback car - any with a sloping rear, constrasts with hatchbackk,saloon
coupe car - 2-door, often also 'fastback'
saloon car,sedan car - (british and american variations of the same thing)
rally car
police car
estate car (word 'estate' also relates to property so explicit aliases will help)
crashed car - variation of derelict.. not all derelict cars were crashed, but "derelict or crashed" could usually be grouped.
crumpled car - might be from a scrapyard? "scrap metal" would be a good label
luxury sports car (not all sports cars are luxury cars)
parked hatchback car (parked could be a property-prefix for all car variants?)

labels containing car that are interior car parts

car radio
car dashboard
car seat car engine car chasis

typos..

credit car - should be credit card :)
eel/car
ferris wheel passenger car - must have missed a comma
sporys car
windscreen car - missed "/" or "of" in the middle. "car windscreen" makes more sense
packed car - parked car *passnger car interior
novelty car
custom car
vintage racing car *property vintage, also car->racing_car
vintage landspeed record car unusual variation at motorshow, vintage/car->racing_car->landspeed_record_car *windceren/car

*noteworthy/unusual combinations

butterfly door/car - specific type of exotic car door. not a "butterfly" :) https://en.wikipedia.org/wiki/Butterfly_doors
mini (car) - given that "mini" is a common prefix (mini excavator), and a name of a certain famous car.. an explicit alias is probably useful
car park entrance *"entrance" could be a common part? entrance/car_park, entrance/building although sometimes entrance is an alias of doorway.. for a car park it isn't.
muscle car - not a "muscle", rather an americanism
toy car - definitely a "car shape" , property - toy? .
remote control car - type of toy
driverless racing car - maybe a property for "driverless" or "autonomous" ?

combinations with words that are always properties (as far as i know)

luxury car - (luxury/car? luxury/watch? luxury/hotel?)
derelict car - (derelict/car, derelict/building, derelict/aeroplane, etc)
empty car park - (property empty, but occurs as disambiguator in alias suggestion "empty glass")
burning car (also burning building, property "burning")

dobkeratops commented 4 years ago

The adapter arrived - so now I have both my 2070 and 1080 running in a spare machine. can fold and leave an 1.5 NN experiments running :)

started a utility to try and make sense of the label list. splitting the labels by "of","or","/", sorting by frquency - full list here https://github.com/dobkeratops/imagemonkeylabelextraction/blob/master/unique_fragments.txt 9856 fragments..still in the "chaos" realm regarding label list length,unfortunatley, the worst case scenario here is having to manually "label the labels" for all these fragments, but at least we can work through domains like "car", "person" let me see how many are extractable (eg there are camel cased fragments within these fragments, people properties like "wearingSportswear","holdingGuitar" ..)

there's 1240 variations containing "man|woman|person|car" 5412 unique words including typos https://github.com/dobkeratops/imagemonkeylabelextraction/blob/master/unique_words.txt

i'll need to make a big list of phrases to extract first (eg "red cable car" -> "red cable_car" ->"red/cable_car", that sort of thing)

head
woman
man
wheel
person
picture
reflection
soil
sitting
eye
sand
gravel
statue
fighter jet
box
grass
river
arm
trees
mouth
aeroplane
car
motorbike
building
foliage
road
handle
roof
pavement
hand
hair
female child
leg
figurine
horse
nose
wing
doll
path
wall
skin
glass
child
bird
tail
dog
buildings
excavator
beach
engine
house
wooden
mural
animal
truck
lake
fallen leaves
bicycle
driveway
sea
girl
bushes
standing
elephant
cabin
boat
chair
pile
rocks
face

dobkeratops commented 4 years ago

.. idea - take the groupings observed (any pair of words) and insert "_" or "/" for the spaces - to emphasise connetion or break

consider any space that we find in labels as ambiguous, (could be a break or connecting a compound name)
so rely on a prebuilt list to inform it given any grouping , where the connections and breaks are

EDIT having gone through some more examples, I start to get uncertain in some cases if they should be properties or not ; e.g. "chinese" . dragon, archway, takeaway. is chinese a property you can tag onto them ? "christmass".. decoration, tree, stocking. "classic". "classic/car", other things could be classic? or "car"->"classic car" alongside "car"->"sports car" etc.

Other examples are very obvious "city bike" = "city_bike","mountain_bike" it's definitely NOT a blend of "city", or of "mountain". there might be a property "forCityUse","forOffroadUse" and we'd alias "forOffroadUse"->"mountain_bike", "forOffroadUse"->"jeep", etc

Seems like we have to decide how we want to train to organise some of the more complex freelabels.

example data

jet_airliner
mountain_bike
pickup_truck
fallen_leaves
metal_railing
blue_sky
asian/woman
luxury/car
fighter_jet
dry/soil
brick/wall
parking_space
dry/mud
river_bank
dry/leaves
car_park
dry/grass
stone/wall
exhaust_pipe
racing_bike
paved_area
wing_mirror
female/child
asian/man
dining_room
jet_aircraft
tree_trunk
palm_tree
steering_wheel
vintage/car
pedestrian_crossing
railway_station
pink/flower
jet_engine
wheelie_bin
castor_wheel
gun_barrel

There's 5330 groupings in the "unqiue list" (after existing / or of are split) now I think some words can be assumed to be always seperate, but I think we need to get through enough examples manually to confirm that.

dobkeratops commented 4 years ago

just trying some experiments with the label list and the "GloVe" embeddings - "could it be used to group the words?"

.. looks very hit and miss "trying to find 3 existing words to aproximate a given point in space" (testing it by just picking a word and trying to find 1 word close to it, and another 2 that would move the vector closer to it when averaged)

gives some interesting results, but some weird ones too..

nearest to woman=girl/man/female # really wanted 'human+female'+'adult'
nearest to plant=plants/factory/plants   # hah one things of organic plant but it can mean factory aswell...
nearest to van=der/truck/gogh  # haha van-truck, van-gogh
nearest to derelict=run-down/abandoned/dilapidated # Thats the best i've seen it do! pretty much perfect

dobkeratops commented 4 years ago

(been experimenting a bit more with this but still looking very random. the latest idea was to try to make a distance measure between 2 word-bags (eg average eucldian dist of each, or smallest) -> then treat the compound labels as word bags, and try to fit them to a user-supplied domain-specific "desired label list" .

i can't get it to work well though. I think you said you tried using some NLP libraries? I've just dived into using numpy to mess with lists of vectors here)

bbernhard commented 4 years ago

The adapter arrived - so now I have both my 2070 and 1080 running in a spare machine. can fold and leave an 1.5 NN experiments running :)

awesome :)

started a utility to try and make sense of the label list. splitting the labels by "of","or","/", sorting by frquency - full list here https://github.com/dobkeratops/imagemonkeylabelextraction/blob/master/unique_fragments.txt

very cool!

9856 fragments..still in the "chaos" realm regarding label list length,unfortunatley, the worst case scenario here is having to manually "label the labels" for all these fragments

my gut feeling is that there's no way around "label the labels" (at least in the long run). For quick experiments, it's probably not needed (as we can live with a few false positives due to "too aggressive grepping"), but I think in the end we can't fully automate that. (at least that's my impression while experimenting a bit with NLP libraries). I played around a bit with spaCy, but couldn't get it to work reliable for our use cases (I haven't spent too much time on it, so there's a good chance that I've missed something crucial).

At the moment I am leaning towards maintaining our own translation/parsing tables. Maybe we can start with a simple regex like expression that simple splits the labels by our common keywords ("of", "or", "/"). (that's something that can be done automatically by some script). After the system has split the label up in it's individual components, the result is presented to the user. The user then either accepts the result (in case the system correctly parsed the label) or has the possibility to override the result manually (in case the label was wrongly parsed). Then the mapping is written to the translation table. I know, this is some tedious work, but as we have to do it only once for every label combination, I think it wouldn't be that bad (I guess most of the time the system can parse the label correctly, so just a simple confirmation would be needed. Hopefully there won't be many cases were human corrections are needed).

Of course, if we could find a way to fully automate that, that would be even better. But so far I haven't found a way to parse labels reliably...so the above would be the "worst case scenario".

dobkeratops commented 4 years ago

https://www.microsoft.com/en-us/research/video/microsoft-rocketbox-avatar-library/ This looks like a great resource which could be used for rendering training data. Having said that coming from Microsoft they probably already did all the possible AI training with this. It seems to be a bunch of human models with animation - that’s really great for a free dataset. I would imagine rendering these from multiple poses, lighting conditions etc .. and training a neural net to recover the character type , pose , lighting etc. (Ie Microsoft probably had this purpose in mind .. coming up with a set that can aproximate general human scenes).

dobkeratops commented 4 years ago

Just been trying to add a few labels in graph node format (albeit minus quotes) ,given that “a->b” implies b is a type of a , it would be consistent to use that in annotations , but that’s not the intention . I was thinking I could build a graph from that data in the label suggestions list .. just a way to store the suggested graph nodes alongside everything else

bbernhard commented 4 years ago

This looks like a great resource which could be used for rendering training data.

very cool, thanks for sharing!

Just been trying to add a few labels in graph node format (albeit minus quotes) ,given that “a->b” implies b is a type of a , it would be consistent to use that in annotations , but that’s not the intention . I was thinking I could build a graph from that data in the label suggestions list .. just a way to store the suggested graph nodes alongside everything else

that's nice. This reminds me that we also have the https://imagemonkey.io/graph which hasn't been updated in a while. Maybe it's possible to integrate the graph better into the existing "toolchain", so that we can somehow keep the graph better in sync with the existing label list. (at the moment they both drift apart pretty fast).

bbernhard commented 4 years ago

fyi: I've experimented a bit with keras-maskrcnn which I want to use as a replacement for MaskRCNN as the latter seems to be not maintained anymore.

I've already got it integrated into the existing monkey script - i.e all one needs to do to train a neural net for object segmentation is to run monkey --type="object-segmentation" --labels="cat|dog". After a few hours of training (using a geforce gtx1080) the model is generated.

I tested the model with a random dog image from the internet...but unfortunately the first attempt in detecting the dog's shape wasn't that good. (see attached screenshot) Selection_054

Not sure yet if the number of training epochs was too low or if it's related to bad input data. :/

dobkeratops commented 4 years ago

more Data and more epochs.. :) still , I wonder if that is the start of locking into the distinctive head and paw features .

Great to see this data training something . Of all the labels we have, I wonder what would get the best results

bbernhard commented 4 years ago

Yeah, you are most probably right on that. :grinning:

I tried it with:

epochs: 30
steps per epoch: 1000
number of annotations per class: ~400

Unfortunately I do not have much experience with MaskRCNN yet. Not sure if those 400 annotations per class are enough to get some decent results or if we need way more? I guess I could increase the number of epochs to a few hundreds in order to see if that changes something. (don't want to increase that number too much, otherwise it takes aaaages to complete).

bbernhard commented 4 years ago

Of all the labels we have, I wonder what would get the best results

Yeah, also thought about that. As I do not know how "stable" MaskRCNN is, I tried to pick some labels (and label sizes) that I could (if needed) manually verify before training. The bigger the training size is, the more likely it probably is that we have some false positives in there (either wrongly labeled and/or wrong polygons)...and I am not sure how MaskRCNN "reacts" to those false positives.

dobkeratops commented 4 years ago

Right 400 per class doesn't sound like enough .. however looking at it another way,

we have 100,000 images (thats deifnitely enough to train something )
the classes overlap in various ways , but we'll need the fleshed out label graph to describe all of that - hence the effort to "label the labels" in progress.
- we could try to distinguish indoor vs outdoor images, but this information could be encoded as ("indoor"->"living room", "outdoor"->"landscape" etc) ..
- or aircraft (=aeroplane|biplane|glider|jet_airliner|fighter_jet|...) vs land vehicles (land_vehicle = car|truck|bus...)
- etc
- offroad vs onroad vheicles (eg visual cues common to mountain bikes, jeeps - bigger springs, knobbly tires), "luxury/clean/new" vs "derelict/dirty/broke/n.."
- if we dont have enough to detect dog outlines precisely, maybe we could go for animal or mammal (combining dog, cat, horse etc)

The next thing is textures - this has been one big goal of mine. we might get further distinguishing bricks , grass, tree/foilage , gravel because here each annotation is really a continuum of examples. (even the pavement vs road example has a texture aspect to it, eg paving tiles vs asphalt)

Ultimately we might have to find ways of combining this data with other datasets (maybe there's different ways to leverage using a pre-trained net, i think you started out trying this .. i had always wondered if you could peel back one layer and add a couple of new layers, and train those , rather than trying to re-shape the existing weights)...

yet another idea is to train a "Variational Auto Encoder" (unsupervised net), which just finds patters throughout the dataset by learning to replicate them and represnet them in a hidden ('latent') vector. I bet you could then train 'a few layers' between this hidden representation and the labels.

... or just get more volunteers :) we can't be the only people in the world who have an interest in a truly open source / creative commons , extensible label set, visual dictionary etc

more data is always better, just takes man-hours to collect. Everyone stuck at home in quarantine at the moment ... there's plenty of man-hours available in this world, if we can get people's attention

bbernhard commented 4 years ago

Ultimately we might have to find ways of combining this data with other datasets (maybe there's different ways to leverage using a pre-trained net, i think you started out trying this .. i had always wondered if you could peel back one layer and add a couple of new layers, and train those , rather than trying to re-shape the existing weights)...

keras-maskrcnn already supports that out of the box. For the above experiment I didn't train everything from scratch, but instead used ResNet50 as a base model (ResNet50 was trained on the Open Images Dataset for 300 classes). My hope was that maybe 400 annotations per class are enough in that case (given that I am not training from scratch).

The next thing is textures - this has been one big goal of mine. we might get further distinguishing bricks , grass, tree/foilage , gravel because here each annotation is really a continuum of examples. (even the pavement vs road example has a texture aspect to it, eg paving tiles vs asphalt)

totally agreed!

yet another idea is to train a "Variational Auto Encoder" (unsupervised net), which just finds patters throughout the dataset by learning to replicate them and represnet them in a hidden ('latent') vector. I bet you could then train 'a few layers' between this hidden representation and the labels.

The whole unsupervised learning algorithms sound really interesting. But the things I've read about those algorithms so far is, that they are really hard to "debug". So if you feed data in and get garbage out, it's often really hard to find the root cause for that.

dobkeratops commented 4 years ago

it would be interesting to know how many of each we currently have, and how many of all summed.

Also the optimum grouping (elephants start to break the general rule for the outlines, or you might stick to furry_quadrupedal_mammal to exclude them)

(my thinking has always been that with the label graph we'll be able to describe things precisely for future expansion , but at the same time group them together for simpler training with borader groups.

eg.

year 1 .. 50,000 annotations - just train "animal" detectors , oring the animals together .. whilst still collecting precise animal names.
year 2 .. 100,000 annotations - start to subdivide "mammal vs reptile"
year 3 300,000 annotations start to subdivide individidual species

bbernhard commented 4 years ago

Minor thought regarding the MaskRCNN segmentational experiment using dog - to use the maximum amount of data, maybe you could try going for quadrupedal_mammal (=dog|wolf|fox|cat|tiger|lion|leopard|cheetah|cow|horse|deer|sheep|goat|gazelle|donkey|pig|lemur|.) -

good idea, I'll try that next. At the moment there's running another MaskRCNN training. I started it after fixing some bugs in the implementation - I hope we get better results now. I did a bit of research and I think we should already get a decent working object detector, even with only ~400 images per class (when using transfer learning). e.g: I found this article here where they trained a neural net to detect balloons with only 75 images.

my thinking has always been that with the label graph we'll be able to describe things precisely for future expansion , but at the same time group them together for simpler training with borader groups.

I think the "problem" with the label graph is, that the graph is quite time intensive to maintain. So, it happens pretty fast, that the graph gets out of sync. I wonder if we can somehow integrate the graph better into the existing workflow, so that updating the graph becomes a habit? At the moment we have a huge amount of labels that aren't yet tracked in the graph. In wonder if we can find a workflow so that adding a label to graph becomes more natural? (I think integrating new labels into the graph would be far easier, if our "label graph backlog" would only be at max. a few handful of labels large, instead of multiple hundreds of labels).

dobkeratops commented 4 years ago

if retraining can get results on <100 examples, that certainly gives us a lot of scope already.

I wonder if we can somehow integrate the graph better into the existing workflow, so that updating the graph becomes a habit?

I quite like the idea of adding graph nodes as free labels eg mammal->gazelle and even annotating that way (I’ve tried to go back retroactively adding more like this) , it’s easy to split ( And straight away it gives a potential simplification .. anyone looking for mammal or gazelle will find that in a grep) ..

Perhaps the system could ask for these kind of hints eg if it doesn’t recognise the label, ask for a known broad category to put it in (animal, vehicle, etc), and maybe give a hint that it recognises the syntax if you entered it directly

There will of course be multiple paths but so long as we’re logically consistent I think we can just prune extraneous links .. the only real hazard is if we accidentally make cycles .

I’ve also done a few like this a->b->c Eg container->waste_container->bin->wheelie_bin .. trade off between complexity and extra parsing rules. I’ll see how it goes in the little python experiment I started (ie I can try to build a graph from the embedded nodes) . I note that it serves the > character as blah blah;gt, easy enough to extract. If I can make it report the labels not already in graph nodes, that will guide the process of retroactively “labelling the labels” more efficiently

One request to make it easier: could you enable the >,- characters in unified (currently I can only add those in the dedicated label entry view)

EDIT i've just seen that those foo->bar labels really do get converted into -;gt .. i.e. when going back to images where i entered that, thats how they show up in the UI. Not sure at what stage it happens.. is the interface to the database itself in XML format, or the data exchange with JS UI elements doing that (again, JS manipulating pages .. I can imagine lots of places where the > character becomes problematic). Would it be too messy to convert it back to a clean -> arrow for display, or is it doable? should we come up with another convention for entering labels with embedded graph nodes? I can still easily extract the existing ones no problem. the arrows just become "-\u0026gt;" in the label list.

bbernhard commented 4 years ago

EDIT i've just seen that those foo->bar labels really do get converted into -;gt .. i.e. when going back to images where i entered that, thats how they show up in the UI. Not sure at what stage it happens.. is the interface to the database itself in XML format, or the data exchange with JS UI elements doing that (again, JS manipulating pages .. I can imagine lots of places where the > character becomes problematic). Would it be too messy to convert it back to a clean -> arrow for display, or is it doable? should we come up with another convention for entering labels with embedded graph nodes? I can still easily extract the existing ones no problem. the arrows just become "-\u0026gt;" in the label list.

you are right, I've messed something up there - thanks for the info! I am doing some html escaping in the frontend to avoid that one can inject code into the DOM. Looks like forgot to unescape those labels again before persisting those in the database :sweat_smile:. I'll push a new version to production in the next days which unescapes the labels before sending them to the backend - then the -> should really be stored like that in the database. I'll also write a script that fixes all the existing labels in the database. So it's no problem if you keep using the ->, I'll "bulk fix" those entries with the script later :)

One request to make it easier: could you enable the >,- characters in unified (currently I can only add those in the dedicated label entry view)

I'll also add that one with the next update. :+1:

Perhaps the system could ask for these kind of hints eg if it doesn’t recognise the label, ask for a known broad category to put it in (animal, vehicle, etc), and maybe give a hint that it recognises the syntax if you entered it directly

that's a cool idea!

dobkeratops commented 4 years ago

ok i'm experimenting with extracting the graph nodes in the label list, ("labels fragments with graph nodes = ~500, the rest = 9000+ ..")

if I stick to a habit of adding new labels in graph format at least I can see it counting down now. EDIT - after going through it a bit, I think we have the important once well covered. Sounds bad having "thousands of orphan labels" but most of the important ones have graph nodes. grepping for car, box, door, person (types like chef,police officer,solider,pedestrian),types of animal etc - there's 1000+ with nodes. And besides that , grepping for man/woman/person seems to work (when using a word break, it seems to be safe to assume those are the 'main label')

example graph nodes in the label list `"heavy_machinery"->"bulldozer" "machine"->"heavy_machinery" "person"->"diver" "diver"->"deep_sea_diver" "helmet"->"deep_sea_diving_helmet" "cylindrucal_container"->"compressed_air_cylinder" "glass/cup"->"wineglass" "knife"->"kitchen_knife" "exercise_eqipment"->"rowing_machine" "locomotive"->"steam_locomotive" "locomotive"->"diesel_locomotive" "building"->"agricultural_building" "building"->"agricultural_building" "agricultural_building"->"barn" "building"->"residential_building" "residential_building"->"apartment_buildings" "vehicle"->"land_vehicle" "land_vehicle"->"wheeled_vehicle" "wheeled_vehicle"->"moon_buggy" "land_vehicle"->"offroad_vehicle" "offroad_vehicle"->"moon_buggy" "spacecraft"->"lunar_lander" "person"->"astronaut" "propulsion_system"->"rocket_engine" "container"->"barrel" "barrel"->"metal_barrel" "metal_object"->"metal_barrel" "container"->"barrel"

bbernhard commented 4 years ago

very cool! If you understood you correctly, then the label graph was generated fully automated by parsing only the label fragments, right? Or did you also add some information by hand?

btw: I am currently working on the migration script to fix the broken labels in the database. At the moment I am searching for the following characters: > (>), <(<), &(&), "(") and '(') in the database in order to find the broken ones. I think that should be sufficient? What do you think?