SemanticMediaWiki / SemanticResultFormats

Provides additional visualizations (result formats) for Semantic MediaWiki
https://www.semantic-mediawiki.org/wiki/Extension:Semantic_Result_Formats
Other
45 stars 75 forks source link

Dygraphs format, Graph extension and other chart/visualzation formats #583

Open krabina opened 4 years ago

krabina commented 4 years ago

Goal

I'd like to point out and discuss the future of data visualization formats. Some like Timeseries format or d3 chart format are using external components that still are updated, while others like Dygraphs format are using external components that have not been worked on for years.

Graph extension

The mediawiki extension Graph (not to be confused with the Graph result format) seams to be a very actively developed extension, also used in Wikimedia projects.

So one goal could be to provide a "Semantic Graph" extension that brings the ability to a the Graph extension to display data annotated in SMW. (Just like "Semantic Maps" did this for the "Maps" extension, or "Semantic Glossary" for the "Lingo" extension.

This way, the SMW ecosystem could profit from an actively maintained extension. Such an integration would also reduce the burden of upgrading several other graph/chart result formats.

Features

These are missing in most result formats:

  1. the ability to display data from an uploaded file (alongside with SMW data). The concept is lined out in the Dygraphs format which can use data from an uploaded csv file.
  2. Using an external csv file (or other resource, e. g. JSON file)
  3. putting raw data in wiki pages (which the Graph extension can do as well as Maps with the GeoJSON namespace)

Discussion

What are your thoughts on that?

akuckartz commented 4 years ago

Can http://vega.github.io/ be used / integrated?

mwjames commented 4 years ago

The mediawiki extension Graph (not to be confused with the Graph result format) seams to be a very actively developed extension, also used in Wikimedia projects.

The person who originally developed that extension and maintained it has left the WMF and we have no good track record of bundling SMW and WMF related things which in most cases end-up creating more work than previously anticipated.

Furthermore, the Graphextension only makes use of vega/vega-lite which seems like a better approach to be used as integration platform instead of relying on an extra WMF middleware.

Can http://vega.github.io/ be used / integrated?

I made a prototype [0] in the past, so, yes it is possible and it only relies on native Vega functionality.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/3431#issuecomment-423744785

On 3/31/20, Bernhard Krabina notifications@github.com wrote:

Goal

I'd like to point out and discuss the future of data visualization formats. Some like Timeseries format or d3 chart format are using external components that still are updated, while others like Dygraphs format are using external components that have not been worked on for years.

Graph extension

The mediawiki extension Graph (not to be confused with the Graph result format) seams to be a very actively developed extension, also used in Wikimedia projects.

So one goal could be to provide a "Semantic Graph" extension that brings the ability to a the Graph extension to display data annotated in SMW. (Just like "Semantic Maps" did this for the "Maps" extension, or "Semantic Glossary" for the "Lingo" extension.

This way, the SMW ecosystem could profit from an actively maintained extension. Such an integration would also reduce the burden of upgrading several other graph/chart result formats.

Features

These are missing in most result formats:

  1. the ability to display data from an uploaded file (alongside with SMW data). The concept is lined out in the Dygraphs format which can use data from an uploaded csv file.
  2. Using an external csv file (or other resource, e. g. JSON file)
  3. putting raw data in wiki pages (which the Graph extension can do as well as Maps with the GeoJSON namespace)

Discussion

What are your thoughts on that?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/SemanticMediaWiki/SemanticResultFormats/issues/583

mwjames commented 4 years ago
  1. the ability to display data from an uploaded file (alongside with SMW data). The concept is lined out in the Dygraphs format which can use data from an uploaded csv file.
  2. Using an external csv file (or other resource, e. g. JSON file)
  3. putting raw data in wiki pages (which the Graph extension can do as well as Maps with the GeoJSON namespace)

I understand the sentiment but please don't forget that the data we display should be available through SMW and as part of the data repository (providing a context), just because it would be convenient displaying some random graph via SMW/SRF isn't part of what SMW should be doing. The information has to come from the store otherwise using SMW as simple graph engine doesn't bring any insights into the data you display, the analysis, or its context.

On 3/31/20, Bernhard Krabina notifications@github.com wrote:

Goal

I'd like to point out and discuss the future of data visualization formats. Some like Timeseries format or d3 chart format are using external components that still are updated, while others like Dygraphs format are using external components that have not been worked on for years.

Graph extension

The mediawiki extension Graph (not to be confused with the Graph result format) seams to be a very actively developed extension, also used in Wikimedia projects.

So one goal could be to provide a "Semantic Graph" extension that brings the ability to a the Graph extension to display data annotated in SMW. (Just like "Semantic Maps" did this for the "Maps" extension, or "Semantic Glossary" for the "Lingo" extension.

This way, the SMW ecosystem could profit from an actively maintained extension. Such an integration would also reduce the burden of upgrading several other graph/chart result formats.

Features

These are missing in most result formats:

  1. the ability to display data from an uploaded file (alongside with SMW data). The concept is lined out in the Dygraphs format which can use data from an uploaded csv file.
  2. Using an external csv file (or other resource, e. g. JSON file)
  3. putting raw data in wiki pages (which the Graph extension can do as well as Maps with the GeoJSON namespace)

Discussion

What are your thoughts on that?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/SemanticMediaWiki/SemanticResultFormats/issues/583

krabina commented 4 years ago

well, your Dygraphs extension brought me the idea. And @JeroenDeDauw shows in the maps extension how useful it can be to have both: to use a GeoJSON from an external source and display SMW data as points over it. Same can be true for many other data types...

mwjames commented 4 years ago

well, your Dygraphs extension brought me the idea. And @JeroenDeDauw shows in the maps extension how useful it can be to have both: to use a GeoJSON from an external source and display SMW data as points over it. Same can be true for many other data types...

I see those as an exception rather that being the rule, data is only useful in context and context is that what you describe via SMW annotations.

On 3/31/20, Bernhard Krabina notifications@github.com wrote:

well, your Dygraphs extension brought me the idea. And @JeroenDeDauw shows in the maps extension how useful it can be to have both: to use a GeoJSON from an external source and display SMW data as points over it. Same can be true for many other data types...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/SemanticMediaWiki/SemanticResultFormats/issues/583#issuecomment-606650405

mwjames commented 4 years ago

Another problem with external sources is that you never know whether the source will be available in a month, a year or so. Relying on "dead" source material you don't have any control over is a common problem for data repositories that think that just because it has an URL it means it will be available forever.

The same issue Wikipedia is facing with dead cite links where something like the web archive helps but demonstrates the problem of relying on external link material.

On 3/31/20, James HK jamesin.hongkong.1@gmail.com wrote:

well, your Dygraphs extension brought me the idea. And @JeroenDeDauw shows in the maps extension how useful it can be to have both: to use a GeoJSON from an external source and display SMW data as points over it. Same can be true for many other data types...

I see those as an exception rather that being the rule, data is only useful in context and context is that what you describe via SMW annotations.

On 3/31/20, Bernhard Krabina notifications@github.com wrote:

well, your Dygraphs extension brought me the idea. And @JeroenDeDauw shows in the maps extension how useful it can be to have both: to use a GeoJSON from an external source and display SMW data as points over it. Same can be true for many other data types...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/SemanticMediaWiki/SemanticResultFormats/issues/583#issuecomment-606650405

krabina commented 4 years ago

btw. an interesting read regarding the downside of the graph extension: https://www.mediawiki.org/wiki/User:Bawolff/Reflections_on_graphs

I'm impressed with [0] SemanticMediaWiki/SemanticMediaWiki#3431 (comment) what can be done to make it happen?

krabina commented 4 years ago

Another problem with external sources is that you never know whether the source will be available in a month, a year or so. Relying on "dead" source material you don't have any control over is a common problem for data repositories that think that just because it has an URL it means it will be available forever. The same issue Wikipedia is facing with dead cite links where something like the web archive helps but demonstrates the problem of relying on external link material.

while this is true in general, often you have governanc over SMW "external" files. They can be an uploaded CSV, or JSON-data in a MediaWiki Namespace or an external resource you control yourself...

mwjames commented 4 years ago

while this is true in general, often you have governanc over SMW "external" files. They can be an uploaded CSV, or JSON-data in a MediaWiki Namespace or an external resource you control yourself...

If you own them and can make them available as "raw" content in some MediaWiki namespace then yes, but beware just uploading them and producing a "nice" graph isn't the job of SMW or SRF.

On 3/31/20, Bernhard Krabina notifications@github.com wrote:

Another problem with external sources is that you never know whether the source will be available in a month, a year or so. Relying on "dead" source material you don't have any control over is a common problem for data repositories that think that just because it has an URL it means it will be available forever. The same issue Wikipedia is facing with dead cite links where something like the web archive helps but demonstrates the problem of relying on external link material.

while this is true in general, often you have governanc over SMW "external" files. They can be an uploaded CSV, or JSON-data in a MediaWiki Namespace or an external resource you control yourself...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/SemanticMediaWiki/SemanticResultFormats/issues/583#issuecomment-606667619

krabina commented 4 years ago

Well I tend to disagree. For data visualizations there are a lot of usecases where you want to describe different data sources (with SMW) and visualize them, without every data item to be stored in SMW. Just as you mentioned in dygraphs. Not every measuring point of something needs to be stored if the data is coming from an outside source anyway. My usecase currently is handling and visualizing spending data from municipalities. These datasets very easily get very big. However, being available to visualize them (interactively) AND being able to describe the data on a more general level could be done very smoothly in SMW.

Of course, this is not the regular use-case, so in general I agree with you :-)

mwjames commented 4 years ago

Well I tend to disagree. For data visualizations there are a lot of usecases where you want to describe different data sources (with SMW) and visualize them, without every data item to be stored in SMW. Just as you mentioned in dygraphs.

Yes, that data scenario exists but again, displaying some random data (here random is made with a reference to some non SMW source data) isn't the job of SMW or its extensions. SMW's responsibilities lies in making those data accessible that it has stored through its interface.

You have to tread carefully here of what SMW should be, can be, and should not be because you don't want to make SMW a master of all trades and I recall " ... main task and objective of Semantic MediaWiki is to extend the MediaWiki platform with an ability to store structured information and make them retrievable either as part of a query result or as data for external federation ...".

So, external sources can be apart of it but to a limit extend and with a very narrow definition of how the integration should happen. I don't think that loading some CSV data from somewhere as source to be displayed via SMW/SRF has anything to do with SMW aside from defining an URL to hold the data definition that can be displayed and loaded via a SMW related extension. I see that scenario outside of the scope of SMW/SRF because you could build an extension X that uses the same approach of loading some URL referenced data into a page via a parser function with no connection to SMW or SRF.

On 4/1/20, Bernhard Krabina notifications@github.com wrote:

Well I tend to disagree. For data visualizations there are a lot of usecases where you want to describe different data sources (with SMW) and visualize them, without every data item to be stored in SMW. Just as you mentioned in dygraphs. Not every measuring point of something needs to be stored if the data is coming from an outside source anyway. My usecase currently is handling and visualizing spending data from municipalities. These datasets very easily get very big. However, being available to visualize them (interactively) AND being able to describe the data on a more general level could be done very smoothly in SMW.

Of course, this is not the regular use-case, so in general I agree with you :-)

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/SemanticMediaWiki/SemanticResultFormats/issues/583#issuecomment-606724175

IRA1777 commented 3 years ago

General thoughts on graph representation of semantic data: I use it a lot, especially through graphviz. To me, SRF "Graph format" is unusable because customizing capabilities through query parameters are too limited, you can exploit only few of graphviz.

A better solution that I use is to query SMW data in array format, store results in an array (array extension), then print the array inside a "tagged" graphviz , with full control over data formatting. {{#tag:graphviz | <my script>}}. Same apply for Pchar4MW extension or graph extension (which I never get to work properly by the way).

One typical "use case" comes from page name, which is generally assumed to be representative of the content (in SRF as in most semantic extension). But I tend to quit this practice and to store content in specific namespaces, with auto-generated number as page name (through extension page forms): ie a "Person" namespace, page names are like Person:0357. Human readable name is a generic SMW property like Has display name, holding what should be displayed. Has display name::Wilhelm Klink.

This allows great flexibility in data management, solves homonyms problems and many other. But it disallow use of many result formats that assume page name as display name.

So as a general rule, I don't see the need of a specific semantic wrapping of an extension, provided that this extension has a scripting language that I can generate by grabbing raw data from SMW query.

alex-mashin commented 2 years ago

I have managed to revive the graph format with ExternalData extension, which is now able to emulate GraphViz using new and yet undocumented parser function {{#get_program_data:}} in tag emulation mode. This function has been tested to successfully embed GraphViz, mscgen, ploticus, gnuplot, PlantUML and LilyPond output, including processed results of {{#ask:}} queries.

It might be worth trying to use this functionality in SRF. Of course, there are obvious security concerns, so testing for possible vulnerabilities is always welcome.

P.S. Speaking specifically of vega, I was unable to install it (let alone the graphoid service) in reasonable time; I gave up while trying to manually install with npm the dependencies required to run yarn build. I am afraid, this solution is deliberately made overcomplicated and practically usable only by WikiMedia Foundation, other examples of such microservice approach being VisualEditor and Math extensions. That can reduce the number of users of a possible new family of formats based on Graph.

alex-mashin commented 2 years ago

P.S. Speaking specifically of vega, …

I was able to install Vega (just don't use yarn and use reasonably recent node.js and npm) and bind it to the External Data extension to display at least simple visualisations. Two examples of such visualisations can be found here.

The limitations of this approach are, so far, these:

Update: I was able to make Vega visualisations interactive, but the solution is far from elegant and is still limited (e.g., && has to be replaced with ! and || due to some strange bug). The example, as well as the necessary settings, long enough to make an extension of their own, can be found here.

Update 2: Here is an example of horizontal bar charts showing participants' numbers of edits per namespace using Vega and {{#ask:}}.

IRA1777 commented 2 years ago

able to emulate GraphViz using new and yet undocumented parser function {{#get_program_data:}} in tag emulation mode.

Unfortunately, #get_program_data is not implemented for MW version < 1.35. Anyway I tried using it with graphviz (dot) in MW1.35, but failed. For instance, I dont now what to feed for "format" parameter. Searching in code didn't help me. Could you provide configuration and a simple code example?

alex-mashin commented 2 years ago

able to emulate GraphViz using new and yet undocumented parser function {{#get_program_data:}} in tag emulation mode.

Unfortunately, #get_program_data is not implemented for MW version < 1.35. Anyway I tried using it with graphviz (dot) in MW1.35, but failed. For instance, I dont now what to feed for "format" parameter. Searching in code didn't help me. Could you provide configuration and a simple code example?

Settings:

// GraphViz:
$edgExeName         ['graphviz']     = 'GraphViz';
$edgExeUrl          ['graphviz']     = 'https://graphviz.org/';
$edgExeCommand      ['graphviz']     = 'dot -K$layout$ -Tsvg';
$edgExeParams       ['graphviz']     = ['layout' => 'dot'];
$edgExeParamFilters ['graphviz']     = ['layout' => '/^dot|neato|twopi|circo|fdp|osage|patchwork|sfdp$/'];
$edgExeInput        ['graphviz']     = 'dot';
$edgExePreprocess   ['graphviz']     = 'EDConnectorExe::wikilinks4dot';
$edgExePostprocess  ['graphviz']     = 'EDConnectorExe::innerXML';
$edgExeTags         ['graphviz']     = 'graphviz';

Wikitext (a static exapmple):

<graphviz>digraph example3 {
  node [shape=plaintext];
  Mollusca [URL="[[w:Mollusca]]"];
  Neomeniomorpha [URL="[[w:Neomeniomorpha]]"];
  X1 [shape=point,label=""];
  Caudofoveata [URL="[[w:Caudofoveata]]"];
  Testaria [URL="[[w:Testaria]]"];
  Polyplacophora [URL="[[w:Polyplacophora]]"];
  Conchifera [URL="[[w:Conchifera]]"];
  Tryblidiida [URL="[[w:Tryblidiida]]"];
  Ganglioneura [URL="[[w:Ganglioneura]]"];
  Bivalvia [URL="[[w:Bivalvia]]"];
  X2 [shape=point,label=""];
  X3 [shape=point,label=""];
  Scaphopoda [URL="[[w:Scaphopoda]]"];
  Cephalopoda [URL="[[w:Cephalopoda]]"];
  Gastropoda [URL="[[w:Gastropoda]]"];
  Mollusca->X1->Testaria->Conchifera->Ganglioneura->X2->Gastropoda
  Mollusca->Neomeniomorpha
  X1->Caudofoveata
  Testaria->Polyplacophora
  Conchifera->Tryblidiida
  Ganglioneura ->Bivalvia
  X2->X3->Cephalopoda
  X3->Scaphopoda
}</graphviz> 

Yes, tested only under MediaWiki 1.35. You will need to get the latest version of External Data, because the tag emulation mode was broken for several days, and then a security issue was found and fixed.

Two working examples (among others, nearer the end), as well as the necessary settings, can be found here.

IRA1777 commented 2 years ago

Thanks a lot, traditio.wiki has very interesting ressources. I tested multiple #get_program_data example, and it worked well. Unfortunately, the graphviz one doesn't.

Internal error [e2224648d63d47534332843a] /mediawiki-1.35.3/index.php?title=Test&action=submit ArgumentCountError from line 49 of /var/www/html/mediawiki-1.35.3/extensions/ExternalData/includes/connectors/EDConnectorExe.php: Too few arguments to function EDConnectorExe::__construct(), 1 passed in /var/www/html/mediawiki-1.35.3/extensions/ExternalData/includes/connectors/EDConnectorExe.php on line 310 and exactly 2 expected

Backtrace:

0 /var/www/html/mediawiki-1.35.3/extensions/ExternalData/includes/connectors/EDConnectorExe.php(310): EDConnectorExe->__construct()

Etc... I'm trying to figure out what's happening, comparison between traditio.wiki (specialpage:version) versions and softwares and my own wiki doesn't show missing stuff... Looks like a "Title" class is not provided. I'll further investigate, because last version of GraphViz extension doesn't work either...

alex-mashin commented 2 years ago

Etc... I'm trying to figure out what's happening,

This error existed for several days, but was corrected two days ago, so, you need to upgrade the master branch.

IRA1777 commented 2 years ago

Worked. Thx so much. As far as I understand the mechanism, you can't switch form dot to neato by adding parameter to the graphviz tag. Still, one can make a specific tag for each renderer.

alex-mashin commented 2 years ago

you can't switch form dot to neato by adding parameter to the graphviz tag. Still, one can make a specific tag for each renderer.

You can. Use layout="neato" attribute. Check out this page for examples, including rendering with neato.

IRA1777 commented 2 years ago

The page diagram (Неориентированный граф на neato) isn't a neato graph. It's dot (btw, fdp, circo etc graphs are all the same in the page). I copied the graph code and switched the

$edgExeParams ['graphviz'] = ['layout' => 'dot'];

to

$edgExeParams ['graphviz'] = ['layout' => 'neato'];

Then I get a real neato layout, which is much more "circle like"

alex-mashin commented 2 years ago

Then I get a real neato layout, which is much more "circle like"

Then it's another bug with the tag emulation mode, which I have already corrected, but the correction was lost in a later complicated story of patches.

Now the template page shows how it works when the bug is corrected (patch). UPDATE: change merged.

IRA1777 commented 2 years ago

Superb! It works great, seems much faster to me, compared to old Extension:GraphViz. Probably because it doesn't upload files? Anyway, I see endless possibilities for this ED extension.

alex-mashin commented 2 years ago

Superb!

Thank you.

Probably because it doesn't upload files?

In injects SVG into web page. SVG is cached in MW database.

IRA1777 commented 2 years ago

In injects SVG into web page. SVG is cached in MW database.

Its a big improvement over lastest Extension:GraphViz versions. This file upload mechanism was a pain in the a*s when you dynamically generate graphs. Each new graph was put in history, making file lists crowded and disk space shrinking. Obscur cache mechanism was not updating displayed pictures...

Thanks again for your contribution to save this marvellous feature in MW. The only functional regression is that you cant save the graph as a picture file anymore.