Open dobkeratops opened 4 years ago
Very cool, thanks for sharing!
Haven't heard about this before, but I have to admit it sounds pretty interesting.
I am wondering if we could get something useful out of it if we feed it the data that we already have collected...
btw: Just stumbled accross this paper here where they tried Multi-Task Learning on the MNIST image dataset. I just skimmed over the paper, but it looks like that multi task learning significantly improved the image recognition accuracy.
update: I've started experimenting with PyTorch , had that churning away for the past few days. just going through building CIFAR-10 classifiers (they give you a load of helper code to download and train on that very easily). I've been trying out setting up different configs of net (basic alexnet style CNNs, then i tried out shortcut layers, dimension reduction like they use in inception, 'residuals').
I know for real work it will be better to use a pre-trained model - initially I just want to find my way around the library and really convince myself of how and why things like 'residuals' work.
Having the spare machine running (as I'd setup for running folding@home) helps (it's nice to run a couple of experiments asynchronously)
Using CIFAR the training times seem doable, but it's only a 32x32 image. I was thinking I could write something to extract scaled 32x32 snippets from imagemonkey, and use the pixelannotation to guide that,i.e pick a reasonable sized rectangular region, scale it down to 32x32, and use the pixel label at the centre as the label (I also want to try out pixel level segmentation as you've posted about before.. trying to figure out if you can just infer deconvolutional layers from the pyramid that produces a label at the centre, or if it will need real pixel-level training) CIFAR-100 seems a nice intermediate,its' 100 label list has good overlap with ours (imagenet-1000 lacks basic car & people labels, strangely)
One worry is the amount of "uninteresting" area our images (skies, flat empty roads which are almost grey). This is why I want to try out extracting regions first. (taking a leap from a net which just handles CIFAR-10 at 32x32 to full segmentation , no idea how long that will take to train.. I'm after a stepping stone )
I figure we can look for the annotation boundaries and as regions of interest. (training on the area where the road meets the pavement, etc). but it wont always be a problem.
I seem to remember you've got a fair amount of utility code written alread to process the images and kick of training autonomously.. I'll have a look around to see where's the best place to start with custom experiments. I need to figure out how to implement this (partial labelling..) within their autograd system etc
Awesome, that sounds great! Really looking forward to that :)
If you are interested in the currently existing API functionalities, it's probably the best to start here
Here is an example how to download images with specific labels.
And here is another example that shows how to parse the JSON formatted (Rectangle, Ellipses, Polygon) Annotations.
If you have any questions, please let me know :).
(I am btw. still working on the joint connections mode. Unfortunately, Covid-19 is now also affecting me quite a bit. At the moment I am way more busy with other other - work related - projects at the moment, than I would like to be. I still try to find at least a few hours per week to work on imagemonkey...but at the current pace it will definitely take a bit longer until the first prototype is ready :/ But I am confident that it gets back to normal in the next months :))
(I am btw. still working on the joint connections mode. Unfortunately, Covid-19 is now also affecting me quite a bit. At the moment I am way more busy with other other - work related - projects at the moment, than I would like to be. I still try to find at least a few hours per week to work on imagemonkey...but at the current pace it will definitely take a bit longer until the first prototype is ready :/ But I am confident that it gets back to normal in the next months :))
better to be busy than not.. imagemonkey is at a stage where it's good enough to use, it's useful as it is.. nothing is really urgent
Something i'd wondered about for a while...
Asking around, supposedly it is possible with the standard frameworks to train on images with incomplete annotations, by writing a custom loss (error) function - by weighting the terms by 1 or 0 based on the presence or absence of the annotation.
https://towardsdatascience.com/custom-loss-function-in-tensorflow-2-0-d8fa35405e4e
this article seems to discuss doing something similar https://www.dlology.com/blog/how-to-multi-task-learning-with-missing-labels-in-keras/
That could be adjusted for a single label if you knew "all" was indeed annoted for a specific label (e.g. extend "is_annotated" to a 3d array, per pixel,per label; safely set to '1' for all pixels of the same label). There might be more variations on how this could be applied.
Not sure if you're aware of this kind of technique already? and I'm trying to find other references. It's not something I've tried myself. It seems like such a common usecase that I imagined the frameworks might have some dedicated support out of the box.
So we should be able to take partially annotated pixel data and apply this technique?
I think the original intention behind "annotate all" is that you get a complete annotation for the whole image, so you get a training value of 1 or 0 per pixel for the specific label.
with this custom loss weighting idea, hopefully we can use all other annotations as a 0 for a specific label, in cases where you have not yet verified that all are annotated.
For multi-label training this beccomes more useful IMO. I was just thinking about this w.r.t the number of part annotations for a person. up to 40 of which say 10 may be done, but it will also apply to street scenes and so on. so you might have 10 labels of interest, and you can find images with a few of those annotated. with this technique hopefully all the images and annotations are useful.