ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

free labels with graph hints #192

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

imagine if you could specify labels such as waste container->skip component->nut food->nut car->convertible , i.e. showing the label's position in a graph

... these would allow using trending labels to give hints for the label graph structure, and perhaps an opportunity to disambiguate aswell, if people aren't aware of potential ambiguities (e.g. in the above example there would really be seperate graph nodes nut (component) nut (food) there might be a way of retroactively identifying and correcting ambiuties i.e. ambiguous word->nut nut->nut (food) nut->nut (component) and you've got a chance to convert component->nut into nut (component) directly

It would need extra parsing in converting labels, e.g. if you make skip productive, a parser would have to split waste container->skip to enable ``skip```

Q1 would it be worth the hassle ... extra complication in the label system, when we could just debate the graph in the forum here - or would it drop out neatly of other work e.g. parsing label expressions for queries

Q2 is using the -> from the dot file graph intuitive enough? alongside the other syntax in the label system & ~ | = those look more programming language inspired; . and :: might be better guesses for grouping. However if people are familiar with dot file editing , -> should be clear

Q3 would it be possible to understand both disambiguation syntaxes, e.g. foo->bar and bar (foo) (wikipedia style) both refer to the same label

Q4 ..or would it just be better to rely on the wikipedia style syntax and consider those as graph hinds... nut (component) is a request for graph nodes: component->nut (component), nut->nut (component) and so on

Once again I haven't relied on this - i've just added a few potential examples

bbernhard commented 6 years ago

Interesting idea! I am personally a huge fan of automation; everything that helps us to get rid of repetitive work is definitely a win in the long run. I think the whole label graph update/maintenance procedure is for sure something we could improve. What I am missing most at the moment, is the ability to inform label graph maintainers that a new label is now productive. It would be great to have some sort of channel, that label graph maintainer could subscribe to, to get a notification when a new label is productive. (such a notification system can be based on email, slack, ...).

If there is activity on the site, then there will always be new labels that need to be added to the label graph(s). I would say it's pretty similar to the filterlists and ad blocking rules that are maintained and curated by various people out there - no matter how good the lists are, there is always something to improve. I think a good notification system is the absolute minimum we need - if we also have the possibility to express the label graph's position in the label, that's even better.

In general I really do like your suggestion, but I am a bit unsure if it also works in case there are multiple label graph implementations. Not sure if there will ever be a use case for that, but the whole label graph concept is designed in a way that it supports multiple label graphs implementations. The idea is, that people can create label graphs that focus on different aspects of the dataset (self driving car, food calories, ...) or are targeted to a specific user group (architects, biologists..).

I guess that the following label fruit->banana could be a good label graph hint for the "general purpose label graph", but might not be the right one for the "tropical fruit label graph".

But I like the idea - it's definitely something worh thinking about it. Maybe we can change it in a way that it also works for multiple label graphs..?

... and perhaps an opportunity to disambiguate aswell, if people aren't aware of potential ambiguities (e.g. in the above example there would really be seperate graph nodes nut (component) nut (food)

that's a really good point - haven't thought about that until now. But you are right, that you be a interesting information for label graph maintainers. What we already have is a description field in the labels.json list (it's currently just not used). I am wondering, if it makes sense to let users set a description on first label use. e.q: nut #a fruit composed of a hard shell and a seed (everything after the # will be removed and interpreted as description)