add label graph - Githubissues

bbernhard commented 6 years ago

requirements:

We need a structured data format which is easy to read and flexible enough for us to describe our label graph. Which format do we want to use here? graphml, graphviz,..?
Some nodes in the label graph will be mapped to existing labels in our database. How should the mapping look like? Can we use the labels uuid for that? (i.e every node in the label graph that will be mapped to a label entry in the database, needs to reference it's uuid.)
It should be possible for users to write their own label graph. That means we need to add an API integration for that.
Ideally, the label graph will be an in-memory implementation (I guess it's an overkill to use a graph database just for a few hundred labels) What's a good data structure to represent such an in-memory graph?

bbernhard commented 6 years ago

Here is a simple graphviz demo:

digraph G { 
    { 
        car [label="car" id="94bbd2ff-8a8e-4d1c-9ac5-f9506aa20e43"] 
        vehicle [color=green style=filled label="vehicle"] 
        truck [label="truck"] 
    } 

    vehicle -> truck
        vehicle -> car
}

In order to link/map the label car to the one that's stored in the database, one needs to reference the uuid of the label car.

I really like the simplistic syntax.

(In case you want to see the output, just paste the snippet here: http://www.webgraphviz.com/ )

bbernhard commented 6 years ago

Here is a (more or less) comprehensive list of all the available graph description languages: https://medium.com/@alexadam/representing-graphs-c8eedfd1b0cb

Just by looking at the examples I definitely like DOT / graphviz the most - it looks like as its easy to read and write while still being pretty expressive.

dobkeratops commented 6 years ago

Ideally, the label graph will be an in-memory implementation (I guess it's an overkill to use a graph database just for a few hundred labels)

I hope that's ok. getting to 100 is a big step forward over the existing list, I also hope 'in-memory' will be ok for 1000's. 1000 of label at 64 bytes of string each, 10 connections x4bytes (32bit indices) - would fit in 128k of memory

Just by looking at the examples I definitely like DOT / graphviz the most

happens to be the one I've heard of aswell, so it seems a good bet for familiarity. (If it's symbols are unquoted, spaces could just be converted to underscores?)

Given the existence of graphviz, even if defining a graph by any other means (some simple JSON, whatever) it would always be useful to write something to convert it to those dotfiles aswell

dobkeratops commented 6 years ago

for reference here's the experiment I did a while back, just go source code literals, which means having 'string[]{..}' prefixes. it let you specify the label then either forward ('examples') or backward ('isa') links - then it filled out the whole graph from that. I think I made it dump some JSON of the result. I guess JSON would be a better starting point e.g. for working with a python script.

https://github.com/dobkeratops/label_list/blob/master/labelgraph.go

bbernhard commented 6 years ago

for reference here's the experiment I did a while back, just go source code literals, which means having 'string[]{..}' prefixes. it let you specify the label then either forward ('examples') or backward ('isa') links - then it filled out the whole graph from that. I think I made it dump some JSON of the result. I guess JSON would be a better starting point e.g. for working with a python script.

Many thanks, thats definitely useful!

At the moment I am thinking about a hybrid approach. That means, the main data (label name, sublabel(s), quiz question(s), quiz answers...etc) will be defined within the json file. I am really in favor of json here, as its a great format when it comes to representing structured data.

As json is pretty bad when it comes to representing hierarchies, I would like to decouple the label data from the actual hierarchical representation. That means on top of all that will be the graph definition - ideally that's just one (graphviz?) file that only contains the hierarchical relationship between the labels. (x is parent of y, y is parent of z...). The graph definition file is meant to be really lightweight, so that it's easy for other users to write their own label graph. I have the following (fictional) use case in mind here:

Imagine that we have a label graph in place where we have ordered and grouped the labels somehow (based on some criteria). Now, there might be someone that wants to query the dataset based on the object's size (e.q car is a big object, bike is a medium sized object, apple is a small one, ...). So they could easily write their own graph definition where they structure/group and order the labels the way they want.

After they have written their custom graph definition they can run their requests through the label graph (e.q give me all medium sized objects) and the label graph would return a list of ImageMonkey labels based on the users custom graph definition.

I hope that's ok. getting to 100 is a big step forward over the existing list, I also hope 'in-memory' will be ok for 1000's. 1000 of label at 64 bytes of string each, 10 connections x4bytes (32bit indices) - would fit in 128k of memory

good point! I guess we should stresstest this with thousands of labels before we settle for that. I hope we won't need a graph database here, as that would mean that users would need to install another tool before they could write their own custom graph definition.

bbernhard commented 6 years ago

I am currently experimenting a bit with the reprensentation of the label graph.

At the moment I am not sure which representation is better suited/more visually appealing for the label graph.

graphviz layout engine label_graph_layout1

pros:

hierarchy is nicely represented

cons:

pretty much static
will probably consume a lot of space once we hit a decent amount of labels

d3 layout engine

label_graph_layout2

pros:

interactive
consumes less space

cons:

it's harder to see the hierarchical relationship between nodes

I guess it would help a bit if we make it a directed graph and use the same nodes color for all related nodes. btw: here is a interactive example: https://bl.ocks.org/BTKY/raw/6c282b65246f8f46bb55aadc322db709/

dobkeratops commented 6 years ago

interesting comparison..

Might want to check how it copes with multiple ancestors rather than just a tree structure, the latter certainly looks more suited to modern touch screens, whilst the former looks more 'ordered'. I suppose it would really come into its own with interactivity, e.g. clicking on one to expand it's graph further (perhaps displaying faded out links or "..." to indicate expandability..) .. you could divide it up into manageable pages as a means of exploration

On another note I've always been a huge fan of the 'miller-columns' filebrowser on the mac, and wondered if there was a way of adapting that to work with a graph (remembering that there would be multiple paths to the same element)

It might be that actually displaying a graph is for debug.. checking that the labels are sanely arranged.. but having the potential seems worthwhile. interactive anything and graphical visualisations makes the site more engaging

bbernhard commented 6 years ago

Might want to check how it copes with multiple ancestors rather than just a tree structure,

good point!

the latter certainly looks more suited to modern touch screens, whilst the former looks more 'ordered'. I suppose it would really come into its own with interactivity, e.g. clicking on one to expand it's graph further (perhaps displaying faded out links or "..." to indicate expandability..) .. you could divide it up into manageable pages as a means of exploration

totally agreed. I guess d3.js is probably the better option here when it comes to interactivity. Another interesting example I just found (also based on d3.js) is this one: https://bl.ocks.org/mbostock/4339083 If you click one a node, it expands. I think this one goes more into the direction of the miller columns you mentioned.

It might be that actually displaying a graph is for debug.. checking that the labels are sanely arranged.. but having the potential seems worthwhile. interactive anything and graphical visualisations makes the site more engaging

totally agreed!

dobkeratops commented 6 years ago

i was just thinking .. could the foo/bar syntax generalise, assisting label discovery (or description) through a label graph -if interpreted as mixed meaning.

If you dont quite know what something is but you can specify some obvious properties or components, the system could get a better guess as to what something is. imagine making "a/b/c.." mean 'search the overlap of the possible refinements and parts of a, b, c..'.

e.g. smartphone/tablet computer == phablet , keyboard/trackpad == media keyboard spoon/fork==spork

I did experiment with a 'part of..' / 'has..' link in my label graph experiment; maybe it would be useful to store them as combined labels (a list of IDs) to avoid label explosion (seperate labels for 'leg of wooden chair' .. 'leg of metal chair' .. etc)

under this picture, there'd be a label "head", sub-types "head of dog", "head of person", etc.. but perhaps the actual combinations could be inferred ..

EDIT relating the 'head of dog', 'head of person' etc might be useful r.e. generalizing (train a net to figure out whats common between 'head' of different organisms, i.e. eyes and mouth; )

bbernhard commented 6 years ago

just a short update:

I think the basic integration is now almost done - I'll push that to production in the next few days. For now I decided to go with d3.js, as it's a real powerful beast that supports a lot of different graph types (and hopefully makes it possible to customize the label graph more easily, so that it fits our needs).

For the label graph representation, I decided to go with the graphviz (dot file) language, as I think its easier to read and write than graphml (but that's just a personal opinion).

A sample graph.dot file for example could look like this:

digraph G { 
    { 
        rootnode [label="root" size=250 fontsize=25 color=red]

        vehicle [label="vehicle" size=150 fontsize=20 color=blue]
        tree [label="tree" id="de9c51d5-b633-4a92-be3f-2e09a7ed5dc4"]
        building [label="building" id="3619dc01-f1e2-4791-9ddd-56550c2a6b7d"]
        car [label="car" id="94bbd2ff-8a8e-4d1c-9ac5-f9506aa20e43"] 
        bicycle [label="bicycle" id="c8cfc6a0-1a20-4e89-b879-d7378b882939"]
        road [label="road" id="26ba089d-d11b-46ff-8f40-e292ba0e7624"]
        person [label="person", id="64766828-a943-433f-8800-1901cebf959d"]

        food [label="food" size=150 fontsize=20 color=blue]
        wholefood [label="whole food"]
        processedfood [label="proc. food"]
        apple [label="apple" id="f81cf567-4798-4e4d-95f9-b430cf04ee55"]
        orange [label="orange" id="5ca7ccad-3b8c-4c9a-ac27-44cddc96d4fa"]
        pizza [label="pizza" id="be8270fd-2c5c-47ff-b938-0555e5201a18"]

        food -> wholefood
        food -> processedfood
        processedfood -> pizza
        wholefood -> apple
        wholefood -> orange

        vehicle -> car
        vehicle -> bicycle

        rootnode -> food
        rootnode -> vehicle
        rootnode -> building
        rootnode -> road
        rootnode -> person
    } 
}

And would result in the following graph:

label_graph

The graphviz file basically consists of nodes and relationships. Each node can be customized (node size, font size, node color..) and optionally linked to an existing label. The linking is done with the id property in the graphviz file. In the id property you put the uuid of the label you want to reference (the appropriate uuids can be found in the labels.json file (https://github.com/bbernhard/imagemonkey-core/blob/develop/wordlists/en/labels.json).

When you now click a node in the graph (e.q vehicle) the following query would be created: car | bicycle. This query gets then used to query the dataset (similar as you can now already do with https://imagemonkey.io/explore ).

As such graphviz files are easy to write, its possible for users to write their own label graph representation. They can evalutate the graph locally on their own machine, or (even better) make their label graph representation available for other users by uploading them to the imagemonkey-core repository.

I am hoping that at some point in time, users will step up and become maintainers of specific label graph representations. So that they will continously monitor the available labels in the dataset and add them to their label graph represenation (if necessary). Over time I am hoping that we end up with some community driven label graphs

e.q I could think of the following graphs:

self driving car label graph
food graph
artists/actors graph (in case we will be hosting those type of images some day)
...

Each label graph representation could then be served like this: https://imagemonkey.io/graph?name=self-driving-car

The idea is to give the label graph maintainers as much power as possible - they should be able to style their graph the way they want it. I am a big fan of reddit here, as they let users style a subreddit the way they want it. Something similar I would have in mind with the label graph concept also.

dobkeratops commented 6 years ago

nice! a whole new dimension to the data. It will be pretty cool to have teh graph visualisable from the outset

bbernhard commented 6 years ago

Another interesting thing would be to add wikipedias recently introduced Page Preview feature. (see https://www.mediawiki.org/wiki/Page_Previews).

I think it could be really nice, if one could hover over a label and a short summary would appear. I guess that could make browsing the label tree way more interesting and entertaining.

I could imagine to add additional parameter(s) to the graphviz label definition file which makes it possible for label graph maintainers to specifiy if some information should be shown, when one hovers over a label node. One could specify then if they want to show some custom message or if they want to use the wikipedia page preview API.

dobkeratops commented 6 years ago

(imagine an animated view with a slide show of a few examples showing inside the graph node, and I remember your 'image galaxy' experiment..)

bbernhard commented 6 years ago

(imagine an animated view with a slide show of a few examples showing inside the graph node, and I remember your 'image galaxy' experiment..)

that's a nice one!

btw: I pushed the first label graph draft to production: https://imagemonkey.io/graph It's far from being perfect, but I thought it might be a good idea to upload a first version early in the process so that we can iteratively improve it.

Some things I definitely want to improve:

node alignment I am not really happy about the current node alignment - it's pretty hard to see the connection/hierarchy between individual nodes. I have to study d3's documentation in more detail in order to find out how I can align the nodes properly.
page redirect if you click on a node (e.q tree), it redirects you to the explore page. I went with that approach, as it was the fastest way to get the whole thing going. From the user experience point of view, I think the page redirect isn't the nicest solution, as it slows you down when exploring the label graph. (you always have to use the browsers back button to go back to the label tree). So hopefully we find a solution that doesn't require a page redirect (maybe show the images in a popup dialog? or, as you already suggested, use a image slideshow?)
wikipedia page preview I also integrated the wikipedia page preview API as experimental feature. You can check it out, if you hover over the car node in the label graph. (it may take some time until it shows up; depending how fast the wikipedia API is).

It's possible to enable the wikipedia page preview on a node per node basis in the graphviz file - i.e label graph maintainers can decide whether they want to make that available for their nodes. Currently, I only enabled it for the car node.

For the wikipedia page preview feature I would see the following improvements:

add a button to the webpage to disable it completely: I think it could get pretty distracting if those popup windows are constantly appearing while one hovers over a node.
move popup window to a different position: I find it pretty annoying that the popup window overlaps the label graph.
make response time faster: Sometimes the wikipedia API is pretty slow; an additional caching layer probably won't hurt.

If you have some recommendations or ideas, please let me know. Always happy to hear those :)

bbernhard commented 6 years ago

just update the graph once more to add some more nodes and make some small changes to the positioning of the nodes; I think it now looks a bit better.

I have to admit it's pretty hard to come up with a label hierarchy that makes sense... I guess it's now the perfect time to have a look at your label experiment (https://github.com/dobkeratops/label_list/blob/master/labelgraph.go) As far as I have seen, you already put a lot of thought into it - so having a look at it will be definitely helpful :)

dobkeratops commented 6 years ago

I have to admit it's pretty hard to come up with a label hierarchy that makes sense

indeed, but that's a ok starting point there. easy to tweak. I see 'nature'-> tree .. perhaps organism -> {plant->{tree,grass,bush,flower}, animal->{mamal->{person, quadrupedal_mammal->{cat,dog}}} } I see what you're trying to do with the 'architecture' node.. we could find a better phrase that covers building & road.

One obvious place it gets awkward: food - many types of food are plants...

I guess it's now the perfect time to have a look at your label experiment

i did have something to spit out JSON after compiling it - maybe I could try converting it to the chosen DOT format too, it would be nice to visualise for an overview.

the whole list might be overkill.. going back over it I started encoding specific species lists etc. but I did try to follow the 'tree of life' to organize that (e.g. animal/plant kingdom.. vertebrates/invertebrates.. etc etc).

I'd guess aiming for 100 fairly universal labels would be a good intermediate step.. see what it actually looks like in the graph view. Then again , the part of the point of a label-graph is we can choose any organisation and easily get a more sensible subset for a specific task

bbernhard commented 6 years ago

indeed, but that's a ok starting point there. easy to tweak. I see 'nature'-> tree .. perhaps organism -> {plant->{tree,grass,bush,flower}, animal->{mamal->{person, quadruped->{cat,dog}}} } I see what you're trying to do with the 'architecture' node.. we could find a better phrase that covers building & road.

totally agreed! And the fact, that english is not my first language doesn't make it easier :D So any help is VERY much appreciated! :)

i did have something to spit out JSON after compiling it - maybe I could try converting it to the chosen DOT format too, it would be nice to visualise for an overview.

that would be really awesome! In case you want to give it a try, here is the current graph.dot: https://github.com/bbernhard/imagemonkey-core/blob/develop/wordlists/en/graphdefinitions/graph.dot. The size, color, fontsize, id and URL parameters are totally optional, so I guess you could omit them in a first iteration.

btw: I wrote a Dockerfile a while ago, which sets up the ImageMonkey infrastructure in a docker container. Unfortunately, I haven't maintained the file since a few months, so the Dockerfile is for sure outdated. But I should be able to make that work again within the next few days.

With the docker container in place, you would be able to quickly test your dot file.

I'd guess aiming for 100 fairly universal labels would be a good intermediate step.. see what it actually looks like in the graph view. Then again , the part of the point of a label-graph is we can choose any organisation and easily get a more sensible subset for a specific task

that sounds great!

dobkeratops commented 6 years ago

With the docker container in place, you would be able to quickly test your dot file.

ok I haven't messed with docker in a while, but I did recently actually sign up for some basic web hosting (with the intention of experimenting) so I should have another go at all that

bbernhard commented 6 years ago

ok I haven't messed with docker in a while, but I did recently actually sign up for some basic web hosting (with the intention of experimenting) so I should have another go at all that

awesome! Just out of interest: Are you using a cloud provider, a dedicated server or a bare metal machine?

btw: In case you want to give Docker a try, the Dockerfile should now be up to date. (here is a small Howto: https://github.com/bbernhard/imagemonkey-core/tree/develop/env/docker ) Some things (like the Quiz) do not work yet, but most of the functionality should be fine now.

I think with the docker image it should be possible to write your own graph.dot file. If you want to give that a try, please let me know, then I can give you a short introduction on how to do this.

dobkeratops commented 6 years ago

Are you using a cloud provider

cloud provider. I gather you need static IP to actually physically host something yourself , and lots of guides talk about greater security hazards r.e. doing this from home

ImageMonkey / imagemonkey-core

add label graph #118