ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

free labeling (again) #254

Open bbernhard opened 4 years ago

bbernhard commented 4 years ago

I know we (@dobkeratops ) had this topic already a few times before, but I wanted to start another attempt tackling that issue.

What's the problem with the current approach?

How should the ideal solution look like?

I would like to see becoming the unified mode more like the traditional annotation tool. i.e: draw annotation + assign label, draw another annotation + assign label...etc

Ok, so why don't we just do exactly that?

Without any label moderation it's likely that we end up with:

I think there's no way around content moderation..no matter how hard we try, at some point a human has to decide whether something is valid or not. Of course we can use computer power/algorithms/logic/neural nets etc to help us here, but I think in the end it needs human input.

But instead of requiring that the label is valid before annotating, I would propose that we also allow to check that afterwards. This should create a more natural workflow which allows label+annotate in one go (even if the label is not known to the system already)

Together with the requirement that this only works when authenticated, I think we can tick off the first two points (wrongly spelled labels + spam/hate speech) as solved from our checklist.

Ok, that was the "easy" part, the two remaining points are definitely harder to solve. Personally I think we do have two options here. Either:

) flat labels + label graph ) labels + properties + label graph

Before we look at those two options in detail, I think we should talk a bit about label parsing. No matter which option we prefer, we probably need a (semi) automated way to semantically parse combined labels.

I think projects like the NLTK or wordnet are probably gold here, but I am wondering whether we can use something simpler (like a regex) for the start.

In it's simplest form, I think a color regex could look like this (untested, just for illustration):

((red|blue|green|orange|black|violet)[[:space:]])?[a-zA-Z]+

The idea would be to parse labels like red apple, green cup, white collar, etc. It of course wouldn't be bulletproof, but I think it could be a starting point.

Ok, let's assume for now we have solved the label parsing and found a way to (semi) automatically parse labels semantically. Next, we need to store the data somehow, which brings us back to the two options above.

I'll start with the first one, the flat labels + label graph approach. Here, we will treat each label as string and offload the ordering/grouping/semantic interpretation completely to the label graph. So the label graph needs to know how to order/align short girl, tall girl, pretty girl, ..etc.

As we are treating each label as a string blob in the database, annotations do not share any information. i.e if there is a picture with a dog that is already labeled+annotated with tall dog and you want to add the information that the dog has also a brown fur, you would need to create another label tall brown dog and annotate the dog again.

Ok, let's look now at the second approach, the label+properties+label graph one. Here, the label is not just a dumb blob in the database, but it can also have properties assigned to it.

The label graph will still be used to hierarchically structure/order the labels, but things like color, material, appearance, etc. (see also https://www.paperrater.com/page/lists-of-adjectives for more examples) is stored together with the concrete annotation. The big advantage I would see here is, that it on the hand would allow us to write more complex search queries (using boolean logic) and on the other hand also allows us to re-use existing annotations. (e.g in the example above it wouldn't be needed to annotate the dog again, just because the color property was added).

Ok, so how does that really work now?

I've tried to sketch the second option (label+properties+label graph) here a bit:

transformation

Of course, we could still add synonyms on top of the "normalized" form, so that one can still use the string "red apple" for querying.

What do you think about that? Does that make sense?

dobkeratops commented 4 years ago

Right if it could parse a string and split it into properties retroactively, that would be great. A reversible process .. properties = prefixes. There’s the problem of ambiguous words eg orange, glass. Let me think if this can be resolved by the position - prefix vs the final word. (I think I would even prefer to use “glass cup” rather than “glass”)

Worst comes to worst, a table of manual translations could be used: this seems to be what they describe having been done for LabelMe

One thing I have tried to do is “/“ based combinations for verbs in label suggestions eg “person/sitting” “person/reading” “man/running” .. I think the verbs are unambiguous but it sounds awkward that they could be prefixes or posfixes .. “sitting person” and “person sitting” both make sense. I like the slash giving you a stronger hint that the words are independent

The other thing to mention.. Would it be possible to hint that one of the words is a simplifiable base label e.g. “hatchback car” can be reduced to “car”, “sitting person” can be reduced to “person” .. could “car” or “person” here be displayed in bold when you enter it, to let you know that it’s recognised a base/primary label?

bbernhard commented 4 years ago

Worst comes to worst, a table of manual translations could be used: this seems to be what they describe having been done for LabelMe

yeah, right. I think this will be an ongoing process anyhow. We probably need to adjust/extend our "label parsing grammar" multiple times until we can parse most of the labels automatically.

One thing I have tried to do is “/“ based combinations for verbs in label suggestions eg “person/sitting” “person/reading” “man/running” .. I think the verbs are unambiguous but it sounds awkward that they could be prefixes or posfixes .. “sitting person” and “person sitting” both make sense. I like the slash giving you a stronger hint that the words are independent

sounds good to me (I guess that's something we could even make configurable).

Would it be possible to hint that one of the words is a simplifiable base label e.g. “hatchback car” can be reduced to “car”, “sitting person” can be reduced to “person” .. could “car” or “person” here be displayed in bold when you enter it, to let you know that it’s recognised a base/primary label?

I think that should be doable. Is there a particular use case you have in mind here?

bbernhard commented 4 years ago

short update:

I am working on this now (and making quite good progress, see this branch here). I expect that I will need another 2-3 weeks and then (hopefully) the first draft should be available :)

bbernhard commented 4 years ago

short update: The first version of "free labeling" is now live and can be found in the unified mode view. It's now possible to

The only restriction is: you need to be logged in, in order to use that feature (that's mainly to prevent (spam) bots from messing with the dataset).

btw: I think I've finally fixed the performance issues in the labels dropdown (the one that caused the browser to freeze for a few seconds during the autocomplete). If it still happens, please let me know.

dobkeratops commented 4 years ago

That’s awesome, I will give it a try.

I think your caution about free labelling is justified (even without malicious spam, there’s ambiguity and spelling mistakes) but this will let you gather examples which can then be considered . I have been content to submit the suggestions in pure label addition mode and continue scraping images.

The fact it’s only open to logged in users means you can hide them until curated? (And possibly translate from personal vocabulary to the consensus .. The personal vocabulary idea could prevent conflict)

bbernhard commented 4 years ago

The fact it’s only open to logged in users means you can hide them until curated? (And possibly translate from personal vocabulary to the consensus .. The personal vocabulary idea could prevent conflict)

Theoretically, we could. But at the moment it's visible to everybody. The only restriction is, that you need to login if you want to add new (i.e not yet unlocked) labels or if you want to add annotations to labels that are not yet unlocked.

So it's basically like this:

I've tried to integrate the free labeling as seamlessly as possible into the existing concept. i.e ideally you shouldn't notice whether you are working on a label that's already productive or a new one - that should be completely transparent to the user.

As for the annotations, I've tried to implement a similar two staged approach, as the one we already have for labels (trending/productive). That means all annotations that belong to non-productive labels are tagged separately in the database. This (hopefully) gives us the flexibility to change labels+annotations bulk wise (in a scripted manner).

e.g:

Imagine someone creates a few dozens annotations with metal pot. Lets imagine further, that we've decided that we don't want to have the material in the label name. What we could do now, is, we could write a translation rule. e.g something like that

metal pot -> pot (material: metal)

So whenever a user adds a metal pot annotation, the system would automatically translate that in the background to label: pot with the property material: metal. So, if the user would open the same image later again, he would see that the metal pot label is gone and instead replaced by a label pot with the property metal.

Not sure, if this is useful, but the two staged approach would give us the possibility to do such things.

dobkeratops commented 4 years ago

IMO that kind of aliased translation will be perfect .. every example will serve to document a valid combination.. and translation into an internal representation will give the best of both worlds (search, pure material training..)