:CONTENTS:
[[#introduction][Introduction]]
[[#installation][Installation]]
[[#debian-repo][Debian Repo]]
[[#change-log][Change Log]]
[[#to-do-tasks][To Do Tasks]]
Introduction See these publications:
RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation. Vladimir Alexiev. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016: [[http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/index.html][Presentation]], [[http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/index-full.html][HTML]], [[http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/RDF_by_Example.pdf][PDF]], [[https://youtu.be/4WoYlaGF6DE][Video]]
Generation of Declarative Transformations from Semantic Models. Vladimir Alexiev. In European Data Conference on Reference Data and Semantics (ENDORSE 2023), Mar 2023: [[https://drive.google.com/open?id=1Cq5o9th_P812paqGkDsaEomJyAmnypkD][paper]], [[https://docs.google.com/presentation/d/1JCMQEH8Tw_F-ta6haIToXMLYJxQ9LRv6/edit][presentation]], [[https://youtu.be/yL5nI_3ccxs][video]]
RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required.
If the example instances include embedded source field names, they can describe a mapping precisely. I've implemented a few more tools to generate transformations:
See http://twitter.com/hashtag/rdfpuml for news, diagrams and announcements.
** License and Citation This work is covered by the [[https://www.perlfoundation.org/artistic-license-20.html][Artistic-2.0]] license.
If you use this software, please cite it as shown above.
** Documentation
** Related Work
The following works use or mention this software:
V. Alexiev, A. Kiryakov, P. Tarkalanov (2017) [[https://www.researchgate.net/profile/Plamen-Tarkalanov/publication/342956150_euBusinessGraph_Company_and_Economic_Data_for_Innovative_Products_and_Services/links/5f0efda445851512999b206b/euBusinessGraph-Company-and-Economic-Data-for-Innovative-Products-and-Services.pdf][euBusinessGraph: Company and economic data for innovative products and services]]. 13th International Conference on Semantic Systems (Semantics 2017)
L. Zhuhadar, M. Ciampa (2017). [[https://www.sciencedirect.com/science/article/abs/pii/S0747563217306933?via%3Dihub][Leveraging learning innovations in cognitive computing with massive data sets: Using the offshore Panama papers leak to discover patterns]]. Computers in Human Behavior. doi:10.1016/j.chb.2017.12.013
C. Debruyne, D. Lewis, D. O’Sullivan (October 2018). [[https://link.springer.com/chapter/10.1007/978-3-030-02671-4_21][Generating Executable Mappings from RDF Data Cube Data Structure Definitions]]. In Confederated International Conferences "On the Move to Meaningful Internet Systems" (OTM 2018), pages 333-350. doi:10.1007/978-3-030-02671-4_21
V. Alexiev (2018). [[http://dipp.math.bas.bg/images/2018/019-050_32_11-iDiPP2018-34.pdf][Museum Linked Open Data: Ontologies, Datasets, Projects (invited report)]]. In Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2018). Volume 8, pages 19-50. Burgas, Bulgaria, September 2018
A.D. Junior (2019). [[http://www.tara.tcd.ie/bitstream/handle/2262/86157/AdemarCrotti-thesis_final.pdf][A Jigsaw Puzzle Metaphor for Representing Linked Data Mappings]]. PhD Thesis, Knowledge and Data Engineering Group (KDEG), Trinity College, Dublin, Ireland
V. Alexiev, P. Tarkalanov, N. Georgiev, L. Pavlova (2020). [[https://dipp.math.bas.bg/images/2020/045-064_1.2_iDiPP2020-24_v.1c.pdf][Bulgarian Icons in Wikidata and EDM]]. Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2020).
Matjaz Rihtar. https://github.com/mrihtar/rdfgraph: inspired by ~rdfpuml~, written in Python 2.7, uses Redland's ~librdf~ library. I worked with Matjaz in the euBusinessGraph project.
Installation Checkout this repo and add ~rdf2rml/bin~ to your path. Install the following prerequisites:
both tools: Perl. Tested with version 5.22 on Windows (cygwin and Strawberry).
rdfpuml:
rdf2rml:
** Docker Image If you prefer to work with Docker so you don't need to install software manually, you can use this [[https://docker-registry.ontotext.com/#browse/search=keyword%3Drdf2rml][rdf2rml image]] from the public Nexus (Docker Registry) of Ontotext. To run it, use:
: docker run -v
Where ~
Note: [[https://github.com/VladimirAlexiev/rdf2rml/pull/7][pull request 7]] of 17 Sep 2019 by Jem Rayfield (~@jazzyray~) dockerizes the installation, and makes extra changes related to input/output and configuration. However, it has not been merged yet
Debian Repo Jonas Smedegaard (~@jonassmedegaard~, dr at jones fullstop dk) has volunteered for some of the tasks below. His development is at https://salsa.debian.org/debian/rdf2rml/branches. To adopt changes, do something like this.
To merge all commits in the ~salsa/develop~ branch:
cd rdf2rml # i.e. your local clone of your Github project git remote add salsa https://salsa.debian.org/debian/rdf2rml.git git fetch salsa git merge salsa/develop
To adopt only single commits from the ~salsa/develop~ branch, issue ~remote~ and ~fetch~ as above, then issue:
git cherry-pick $commit1 $commit2 $commit3
Change Log ** 2024-07-10 clarify licensing
[[https://github.com/VladimirAlexiev/rdf2rml/issues/31][Issue 31]]: settle on Artistic-2.0 license ** 2024-07-10 rdfpuml.pl: handle complex types
[[https://github.com/VladimirAlexiev/rdf2rml/issues/10][Issue 10]], [[https://github.com/VladimirAlexiev/rdf2rml/issues/14][Issue 14]]
See [[https://github.com/VladimirAlexiev/rdf2rml/tree/master/test/blank-types#readme][test/blank-types]] ** 2023-06-07 rdf2sparql.pl: minimize binds in ~delete~ clause [[https://github.com/VladimirAlexiev/rdf2rml/issues/27][Issue 27]]: minimize the ~delete~ clause to include only necessary binds:
~--filterColumn~ variable prebind
templated GRAPH URL and its constituent variables ** 2023-06-06 rdf2sparql.pl: global ~--filter~ options
[[https://github.com/VladimirAlexiev/rdf2rml/issues/26][Issue 26]]: add command-line options ~--filterColumn, --filter~ that are useful for handling both initial loading and data updates.
See [[https://github.com/VladimirAlexiev/rdf2rml/blob/master/doc/rdf2sparql.md#global-filtering][global filtering]] and ~test/graphs-crunchbase~ ** 2023-06-01 rdfpuml.pl: remove Carp::Always
[[https://github.com/VladimirAlexiev/rdf2rml/issues/2][Issue 2]] remove ~Carp::Always~ since it produces a stack trace that's too verbose ** 2023-05-17 rdf2sparql.pl: Conditional Nodes
Support "Conditional Nodes", i.e. URLs that are conditional on the existence of some fields.
[[https://github.com/VladimirAlexiev/rdf2rml/issues/22][issue 22]] fixed (2023-05-31) ** 2023-05-05 rdfpuml.pl: don't mangle round brackets
[[https://github.com/VladimirAlexiev/rdf2rml/issues/21][issue 21]]: Round brackets in fields (eg "(name)") and URLs (eg <type/(type)>) are not mangled to square brackets anymore ** 2023-04-29 rdfpuml.pl: puml:option
[[https://github.com/VladimirAlexiev/rdf2rml/issues/18][issue 18]] Add ~puml:option~ for ~left to right direction~ etc ** 2023-04-19 rdf2sparql.pl: per-model filter, dynamic graph
[[https://github.com/VladimirAlexiev/rdf2rml/issues/19][issue 19]] Implement filter function, see ~test/filter-content~
[[https://github.com/VladimirAlexiev/rdf2rml/issues/20][issue 20]] Allow dynamic graph (computed from a data column), see ~test/graphs-crunchbase~ 2022-08-23 rdf2sparql.pl: add datatype to var name instead of UPPERCASING Datatype attachment eg ~strdt(?var,xsd:date)~ now outputs to ~?var_xsd_date~ to avoid conflict with input field names in ALL_UPPERCASE 2022-08-23 rdfpuml.pl: handle blank-node types; add shell scripts
[[https://github.com/VladimirAlexiev/rdf2rml/issues/10][issue 10]] Handle blank-node types that occur on owl:Restriction (see ~test/blank-node~)
Duplicate ~rdfpuml.bat, puml.bat~ as shell scripts ~rdfpuml, puml~ for use in Makefiles across Linux and Windows 2022-08-15 rdf2sparql.pl: merge to one tool Merge ~rdf2tarql~ and ~rdf2ontorefine~ to one tool ~rdf2sparql~ 2022-04-08 rdf2ontorefine.pl: generate OntoRefine Update queries Add script to generate OntoRefine SPARQL Update queries from model. 2021-09-02 rdfpuml.pl: Unicode Processing Use Perl option ~-C~ when invoking for proper Unicode processing. See doc section ~rdfpuml.html#Unicode~ 2020-09-17 rdf2rml: logicalTable Use URL for logicalTable instead of blank node, so that R2RML generated from different models for different tables can be merged more easily. Warning: this assumes that all instances of one subjectMap use the same query. 2020-06-01 rdf2tarql.pl: generate TARQL scripts Add rdf2tarql.pl script to generate TARQL script (CSV-RDF conversion) from model. 2020-06-01 rdf2rml: improve scripts, SQL query/table propagation
Improve script to abort if the first pipeline step ("update") fails
Improve script to work on Cygwin (invokes the Jena tools as ~riot.bat~ and ~update.bat~)
Filter out harmless warnings from Jena update's error log for datatypes like ~xsd:integer, xsd:date~ etc since the mention of a source field doesn't match the syntax of such literals.
If a node has single outgoing link and no SQL query/table (~puml:label~), propagate that property backward across the link into the node (previously that was done only for incoming links) ** 2020-05-30 rdf2rml: handle inverse edge When an edge ~Y-P-X~ is recorded in the RDB table of ~X~ (as foreign key) or in an association table, it is awkward to specify that table in the node ~Y~. So I added this SPARQL UPDATE clause:
If a node ?y has no SQL, is not Inlined, has a single outgoing edge, then add the SQL of its counterparty ?x as default ** 2018-11-14 rdfpuml.pl: avoid puml:stereotype class node I often define ~puml:stereotype~ for some classes in prefixes.ttl. If the class is not used in some particular turtle, it should avoid emitting a disconnected puml class.
~stereotypes()~: Avoid emitting
~has_statements_different_from()~: Check that a node has statements other than puml:stereotype 2018-06-29 rdfpuml.pl bug: class and puml:InlineProperty When a type is also used with ~puml:InlineProperty~, it caused this error: : Can't locate object method "uri_value" via package "RDF::Trine::Node::Literal" at rdfpuml.pl line 261. : main::puml_qname(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 279 : main::puml_node2(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 128 An inline is converted to a literal, but rdf:type is always assumed to be a URL. Test: [[./test/regression/type-inlineProperty.ttl]] 2018-04-05 rdfpuml.pl: Arrow Attributes Add arrow attributes (dotted, dashed, bold) and length Test: [[./test/regression/arrowLen.ttl]] 2018-02-25 rdfpuml.pl: Arrow Color Support arrow color (named or hex) 2017-08-25 rdfpuml.pl: decorative arrows Fix unicode of "decorative arrows" on links going to a Reified Relation: : left => "←", right => "→", up => "↑", down => "↓" ** 2016-02-10 rdfpuml.pl: blank nodes, hidden links
support blank nodes
support new puml "hidden" links that can sometimes help the layout: http://plantuml.com/class-diagram#layout
To Do Tasks Help needed for the following tasks. Post bugs and enhancement requests to this repo!
** Near-term
*** Modularize and Package Better
*** Regression Tests
*** rdf2rml: disentangle inverse edge In the case ~Y-P-X~ described above:
*** Release on CPAN
*** Add Unicode tests Add ttl with non-ASCII chars: Accented, Cyrillic, French, etc.
* Prefixes Allow specifying the prefixes file See https://github.com/VladimirAlexiev/rdf2rml/pull/7 Eliminate Curie.pm [[./lib/RDF/Prefixes/Curie.pm]] remembers ~@base~ and uses that for URL shortening. Once [[https://github.com/kasei/perlrdf/issues/131][perlrdf#131]] is fixed, eliminate this dependency (local module) *** Remember prefixes from input file ~rdfpuml~ shortens URLs using prefixes only from ~prefixes.ttl~, but should also use prefixes defined in the individual input file. Support more RDF Formats Now it only supports Turtle, because it concatenates ~prefixes.ttl~ to the main file. If it can collect all prefixes from RDF files, such concatenation won't be needed
*** Batch Processing Issue [[https://github.com/VladimirAlexiev/rdf2rml/issues/1][#1]]: plantuml is slow to start up, so we'd like to process a bunch of ~puml~ files at once. The best way is to have a smarter script or ~Makefile~ that uses the following http://plantuml.com/command-line features:
**** "Manual" Batching Before I discovered the ~-checkmetadata~ option, I had the idea that ~rdfpuml~ could put several diagrams in one ~puml~ file:
@startuml file1.png
@enduml @startuml file2.png
@enduml
However, this interferes with ~make~ processing that regenerates only ~png~ for changed ~ttl~ files, and makes things less modular overall.
** Mid-Term
*** Upgrade to use Attean [[https://github.com/kasei/perlrdf][Trine (Perl RDF)]] is end of life. [[https://github.com/kasei/attean][Attean]] is the new generation
*** Integrate in Emacs ~org-mode~ Write Turtle, see diagram (easy to do)
*** Node colors, icons, tooltips See [[./ideas]]
*** More arrow types and styles
[[./ideas/arrows.png]] [[./ideas/arrows-2.png]]
Arrow styles and colors (bold, dashed etc): https://mrhaki.blogspot.com/2016/12/plantuml-pleasantness-get-plantuml.html
~plantuml -pattern~ regexes: : dotted|dashed|plain|bold|hidden|norank|single|thickness
*** Extra Layout Options Local layout options are described in [[http://wiki.plantuml.net/site/class-diagram#help_on_layout][Help on Layout]]:
skinparam Linetype ortho skinparam NodeSep 80 skinparam RankSep 80 skinparam Padding 5 skinparam MinClassWidth 40 skinparam SameClassWidth true
And there are a lot more undocumented features: https://forum.plantuml.net/7095
*** Custom Reification Ability to describe custom reification situations using the Property Reification Vocabulary (PRV)
*** Use MindMap/WBS for Hierarchies Plantuml now has [[http://plantuml.com/mindmap-diagram][MindMap]] and [[http://plantuml.com/wbs-diagram][WBS (or OBS)]] diagrams that use a simple bulleted syntax to draw hierarchies.
It would be nice to use this to draw hierarchies of individuals, in particular taxonomies.
Here are examples of the two styles:
** Long-Term *** rdf2soml to Generate Semantic Object Models A new tool ~rdf2soml~ to generate Ontotext Platform SOML from RDF examples.
What's missing? Most importantly: property cardinality and virtual inverses.
PlantUML can show arrow cardinalities, and this simple and natural [[http://www.plantuml.com/plantuml/uml/SoWkIImgAStDuSh8J4bLICuiIiv9XR1JSmjAAXLoKqioybEAaOKIIqgACfDAIrABkI8Kb0oi39KKT7DIqqfqxHIK3ArobHGY5QmK2eho2_HZyZBpoWA0B2w7rBmKe2q0][PlantUML code]]:
X "0:1" -left-> "1:m" Y : prop/\ninvProp
Is depicted as follows:
[[./ideas/cardinality-and-inverse.png]]
We have two options how to express this in triples:
*** Cardinality With RDF
:X :prop :Y.
<< :X :prop :Y >> puml:arrow puml:left; # direction puml:min 1; puml:max puml:inf; # cardinality puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"]. # virtual inverse
**** Cardinality With Blank Node
:X :prop :Y.
:X puml:left :Y. # direction :X :prop [ # a puml:Cardinality; # may need this marker class to skip the node from the diagram puml:min 1; puml:max puml:inf; # cardinality puml:object :Y; # only needed if X has several relations "prop" and they need different annotations puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"] # virtual inverse ].
rdf2shape to Describe & Generate RDF Shapes Visualize RDF Shapes (SHACL and ShEx) Issue [[https://github.com/VladimirAlexiev/rdf2rml/issues/8][#8]]: discussion with Thomas Francart of Sparna
I developed this SHACL to PlantUML converter, in Java, based on TopQuadrant SHACL lib, and the result is at https://shacl-play.sparna.fr/play/draw and code at https://github.com/sparna-git/shacl-play/tree/master/shacl-diagram
I don't have a strong opinion on the example you provide, an alternative idea that comes to my mind is
:node1 :link [ rdf:value :node2; puml:min 1 ; puml:max 2 ; ]
But this changes the structure of the example graph itself, which might not be convenient
*** Generate transformations for other than relational sources R2RML works great for RDBMS, but how about other sources? Extend rdf2rml to generate: