1.3.1 legend related scales in viz + title in meta

paulgirard commented 2 years ago

This 1.3.1 proposal aims at adding to GEXF data to document graph drawing with legend and title. GEXF viewer tools (such as https://gitlab.com/ouestware/retina/) can't so far indicate what is the logic behind the visual aspects without reverse engineering the viz parameters/node attributes sets with some heuristic magic. A good graph drawing often also starts with a good title on top of description.

Therefore this proposals has two main parts:

add a title element in the GEXF meta
extend the viz module to add ways to describe how the viz parameters were calculated from node/edge attributes. It adds ways to store the ranking/partition parameters and layout settings used in Gephi or in other GEXF producers. It has primarily a documentation objective but the current specs looks complete enough to allow drawing tools to not only draw a legend but also recompute the viz parameters from attributes.

To get a more precise idea of the proposal see:

the new 1.3.1 primer section about scales: gexf-131-primer-legend.pdf
the extended viz rnc file: https://github.com/gephi/gexf/blob/legend/specs/1.3.1/_viz.rnc#L82-L158

ping @duncdrum and @gvegayon for comments.

paulgirard commented 2 years ago

In the layout documentation, the current specs use a layoutalgorithm attribute which is a string. This suppose that GEXF related tools agrees on a layout algorithm name convention. Not only the algorithm name but also the parameters. Since the primary objective is only documentation that's probably fine. Recomputing the layout would require to recognize layout algo and parameters names...

paulgirard commented 2 years ago

Thank you @jacomyma

Scales

I added the scalelabel for documentation (I mean for human read). I agree the scalepoint are not optimal to recreate the curve but it would work for any curve even complex non-function based one. An alternative would be to agree on a function expression language or a finite list of frequently used method. My opinion, the former could be an option the later feels too limited.

layout

Good point. I would propose to extend the layout element to allow to host a list of layouts rather than just one. The order is important but I guess we can use the order of XML children.

<viz:positions>
   <viz:layout algorithm="forceatlas2" referenceURL="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679">
       <viz:param name="scale" type="integer" value="10"/>
       <viz:param name="stronger gravity" type="boolean" value="true"/>
   </viz:layout>
   <viz:layout algorithm="nooverlap">
       <viz:param name="speed" type="integer" value="3"/>
       <viz:param name="ratio" type="float" value="1.2"/>
       <viz:param name="margin" type="float" value="5.0"/>
    </viz:layout>
 </viz:positions>

What do you think?

duncdrum commented 2 years ago

@paulgirard while we could rely on sequence position, the actual order of steps is kind of important, why not allow for an optional @step element that takes xs:positiveInteger as values. E.g.:

<viz:positions>
   <viz:layout algorithm="forceatlas2" referenceURL="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679" step="1">
       <viz:param name="scale" type="integer" value="10"/>
       <viz:param name="stronger gravity" type="boolean" value="true"/>
   </viz:layout>
   <viz:layout algorithm="nooverlap" step="2">
       <viz:param name="speed" type="integer" value="3"/>
       <viz:param name="ratio" type="float" value="1.2"/>
       <viz:param name="margin" type="float" value="5.0"/>
    </viz:layout>
 </viz:positions>

paulgirard commented 2 years ago

Indeed, explicit is always better than implicit. I am adding this too.

gvegayon commented 2 years ago

@paulgirard, thanks for this; it is looking very useful. I like @duncdrum's idea about step, especially in reproducible research. Now, the challenge will be on the Gephi side, which sequence of steps to store. Thinking out loud here, before saving GEXF files, Gephi could show the user the last n layout changes and select which ones to store; but that's a problem for later, I guess.

I also like @duncdrum's idea about using a common math language for the scalelabel attribute. Since all is going web, I would suggest something like JavaScript's Math. In such a case, it could be beneficial to perhaps define attributes as functions, for example, instead of:

<viz:sizes scale=”quantitative” scalelabel=”square−root”>
  <viz:scalepoint forratio=”0” factor=”0” />
  <viz:scalepoint forratio=”0.1” factor=”0.316227766”/>
  <viz:scalepoint forratio=”0.2” factor=”0.447213595”/>
  <viz:scalepoint forratio=”0.3” factor=”0.547722558”/>
  <viz:scalepoint forratio=”0.4” factor=”0.632455532”/>
  <viz:scalepoint forratio=”0.5” factor=”0.707106781”/>
  <viz:scalepoint forratio=”0.6” factor=”0.774596669”/>
  <viz:scalepoint forratio=”0.7” factor=”0.836660027”/>
  <viz:scalepoint forratio=”0.8” factor=”0.894427191”/>
  <viz:scalepoint forratio=”0.9” factor=”0.948683298”/>
  <viz:scalepoint forratio=”1.0” factor=”1”/>
  <viz:range min=”1” max=”10” default=”1” />
</viz:sizes>

Do

<viz:sizes scale=”function” scalelabel=”square−root”>
    function(x) {
      return Math.sqrt((x-1)/(10-1));
    }
</viz:sizes>

I am no expert on XML, but having something like this would be super. Is this something worth implementing?

paulgirard commented 2 years ago

Thank you @gvegayon I don't think accepting plain JavaScript is a good idea as it opens code injection security risks and it supposes to chose/promote one programming language into a neutral data format.

Ideally such expression should be mathematics only. To take your example it should reduce to:

sqrt((x-1)/(10-1))

I can't find a standard for mathematical expression syntax targeting evaluation and not rendering (MathML is for rendering). (note: the shunting yard algo is for parsing and ordering tokens and not a math language standard https://en.wikipedia.org/wiki/Shunting_yard_algorithm)

In my opinion such a mathematical expression should be easy to evaluate in: Java, Python and JavaScript worlds. It looks like every math expression evaluation library is using its own syntax without pointing to a standard:

But maths main functions looks like having the same name is those examples. Which means that we should check/document what are the supported math functions for this expression after having check they are common to most frequently used implementations... Doable but not exactly exciting. Or to put differently looks like a more complicated not bringing much more than a set of common mathematic functions we add to the GEXF format.

To finish we should keep in mind that this would require GEXF producer/consumer such as Gephi to implement math expressions production/evaluation. So we should evaluate the ease of use of our representation choice in this regards.

To finish on this here are the so far encountered possible ways to describe a quantitative scale non-linear function in GEXF:

a finite list of common mathematical functions (log, sqrt, pow...) to add in GEXF format
the GEPHI spline solution : two points in 0-1 0-1 space defining a bézier curve from 0,0 to 1,1
a discrete version of the curve (what is in the current proposal): finite list of normalization curve points
a mathematic expression as discussed in this comment

At this point my personal feeling is to chose a finite list of common math functions (D3 does that : https://github.com/d3/d3-scale#continuous-scales) or splines (already implemented in Gephi and flexible).

duncdrum commented 2 years ago

@paulgirard since we are talking about gexf as data format, is there anything missing from Xpath math functions? https://www.w3.org/2005/xpath-functions/math/#fo-math-summary I'd say these would be a more natural fit than Java or a custom syntax. Any xpath processor would be able to handle these already.

just to note not having math expressions is not a showstopper for me.

Yomguithereal commented 2 years ago

I tend to agree with @paulgirard personally and would be happy with only well-known, parametrizable, scale options following d3 etc. such as pow, log, lin and sqrt. I would go as far as using the splines for Gephi compat and if you need more complexity but I draw the line at custom math expressions as it would introduce too much complexity and potential hurdles. I am not very fond of curve discretization with points (but it could be helpful with color and their strange spaces).

gvegayon commented 2 years ago

Good points, @paulgirard! The thing about personalized math functions, @Yomguithereal, is mostly about flexibility. In general, I like building tools/standards that provide some wiggle room for things I have not thought of. Nevertheless, I also appreciate having a well-encapsulated file format! On a related note, the NeXML file format (for phylogenics) includes a meta tag that allows adding arbitrary annotations.

That said, I agree with your last comment, @paulgirard,

At this point my personal feeling is to chose a finite list of common math functions (D3 does that : https://github.com/d3/d3-scale#continuous-scales) or splines (already implemented in Gephi and flexible).

mbastian commented 2 years ago

+1 on supporting a finite set of common functions, in addition of the splines for compatibility. If we have this, do we really need to support the discrete version?

paulgirard commented 2 years ago

Thank you all. As we converged to a solution I updated the proposal :

added pow, sqrt, log10, log, exp, exp10 transform functions
added spline
removed discretized solution

      <attribute id="degree" title="Degree" type="integer">
        <default>0</default>
        <viz:sizes scale="quantitative" scalelabel="square-root">
          <viz:transform>
            <viz:sqrt />
          </viz:transform>
          <viz:range min="1" max="10" default="1" />
        </viz:sizes>
      </attribute>
      <attribute id="size" title="Size" type="integer">
        <default>0</default>
        <viz:sizes scale="quantitative" scalelabel="square-root">
          <viz:transform>
            <viz:pow exponent="2"/>
          </viz:transform>
          <viz:range min="1" max="25" default="1" />
        </viz:sizes>
      </attribute>
      <attribute id="pagerank" title="Page Rank" type="integer">
        <default>0</default>
        <viz:sizes scale="quantitative" scalelabel="spline">
          <viz:transform>
            <viz:spline>
              <viz:origin-control-point x="0.6" y="0.01"/>
              <viz:destination-control-point x="0.8" y="0.9" />
            </viz:spline>
          </viz:transform>
          <viz:range min="1" max="5" default="1" />
        </viz:sizes>
      </attribute>

What do you think?

I am waiting for some approvals before updating the primer.

paulgirard commented 2 years ago

ps: I couldn't find a way to reuse XPATH math function definition as XMl specs are very new to me. If anyone think there is a better way to specify math transform function please let my know :pray:

gvegayon commented 2 years ago

Thank you, @paulgirard! Question: How do <default>0</default> and <viz:range min="1" max="5" default="1" /> coexist (honest question)?

duncdrum commented 2 years ago

@paulgirard just saw this, I ll try to have a fork of your PR ready with xpath math before the weekend.

mbastian commented 2 years ago

@paulgirard One thought about degree columns. Normally, a GEXF wouldn't include a degree, in-degree, out-degree or edge kind columns as those directly depend on the graph so not really needed to have it as an attribute. We would't plan to export those columns in GEXF via Gephi for instance. But if a legend is based on the degree column we should still include it somehow, right? What do you suggest?

gephi / gexf

1.3.1 legend related scales in viz + title in meta #18

Scales

layout