Open henrieglesorotos opened 12 months ago
Reckon it's something we could work on @anuzzolese? Also are there any tests?
Hi @henrieglesorotos, if i got the problem you are referring to correctly I would say that it is somehow implemented (maybe not the best solution, but we can discuss about improvements). In fact, pyrml supports the parametrisation of RML mapping files by relying on Jinja2.
RML files processed by pyrml can accepts parameters as Jinja2 does, e.g.:
rml:logicalSource [
rml:source {{ source_file }};
rml:referenceFormulation ql:CSV
]
Than when you instantiate your mapper in the Python code you can do something like this:
from pyrml import RMLConverter
from rdflib import Graph
rml_map_file: str = '/path_to_your_rml'
# here you create a dictionary for linking actual values to the parameter defined in the RML files (i.e. 'source_file').
vars = {'source_file': './examples/artists/Artist.csv'}
rml_mapper: RMLConverter = RMLConverter.get_instance()
g: Graph = rml_mapper.convert(rml_map_file, template_vars=vars)
This is excellent news! Can we add to the docs? Also - shall we create some simple tests if they don't exist?
Yes, controbuting in documenting and providing how-to guides would be utmost helpful.
@anuzzolese
Having some issues. See example below:
We have some pre-existing rml rules in mapping.ttl
:
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#>.
@prefix fno: <https://w3id.org/function/ontology#>.
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#>.
@prefix void: <http://rdfs.org/ns/void#>.
@prefix dc: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix : <http://mapping.example.com/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix industries: <https://data.beamery.com/naics/2022/industries/>.
:rules_000 a void:Dataset.
:source_000 a rml:LogicalSource;
rml:source "input.json";
rml:iterator "$";
rml:referenceFormulation ql:JSONPath.
:rules_000 void:exampleResource :map_Concept_000.
:map_Concept_000 rml:logicalSource :source_000;
a rr:TriplesMap;
rdfs:label "Concept".
:s_000 a rr:SubjectMap.
:map_Concept_000 rr:subjectMap :s_000.
:s_000 rr:template "https://data.beamery.com/naics/2022/industries/{NAICS22}#this";
rr:graphMap :gm_000.
:gm_000 a rr:GraphMap;
rr:template "https://data.beamery.com/naics/2022/industries/{NAICS22}".
:pom_000 a rr:PredicateObjectMap.
:map_Concept_000 rr:predicateObjectMap :pom_000.
:pm_000 a rr:PredicateMap.
:pom_000 rr:predicateMap :pm_000.
:pm_000 rr:constant skos:example.
:pom_000 rr:objectMap :om_000.
:om_000 a rr:ObjectMap;
rml:reference "Index Item Description";
rr:termType rr:Literal;
rml:languageMap :language_000.
:language_000 rr:constant "en".
Input file: input.json
{"NAICS22":"315990","Index Item Description":"Hats, cloth, cut and sewn from purchased fabric (except apparel contractors)"}
I am getting:
python converter.py -o test.ttl mapping.ttl
Traceback (most recent call last):
File "/Users/henrieglesorotos/repos/pyrml/converter.py", line 65, in <module>
PyrmlCMDTool().do_map()
File "/Users/henrieglesorotos/repos/pyrml/converter.py", line 34, in do_map
g = rml_converter.convert(self.__args.input, self.__args.m)
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_mapper.py", line 131, in convert
triple_mappings = RMLParser.parse(rml_mapping)
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_mapper.py", line 46, in parse
return TripleMappings.from_rdf(g)
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 1586, in from_rdf
return set([TripleMappings.__build(g, row) for row in qres])
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 1586, in <listcomp>
return set([TripleMappings.__build(g, row) for row in qres])
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 1594, in __build
predicate_object_maps = PredicateObjectMap.from_rdf(g, row.tm)
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 752, in from_rdf
return list(map(lmbd(g), qres))
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 751, in <lambda>
lmbd = lambda graph : lambda row : PredicateObjectMap.__build(graph, row)
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 758, in __build
predicates = PredicateBuilder.build(g, row.pom)
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 669, in build
predicates += PredicateMap.from_rdf(g, predicate_ref)
File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 629, in from_rdf
pm = PredicateMap(row.tripleMap, row.map, row.termType, row.predicateMap)
File "/Users/henrieglesorotos/repos/pyrml/venv/lib/python3.9/site-packages/rdflib/query.py", line 124, in __getattr__
raise AttributeError(name)
AttributeError: tripleMap
Any ideas?
Btw - we generally work in yarrrml so it's simpler, and then convert using https://github.com/RMLio/yarrrml-parser
FYI:
python --version
== 3.9.0
pip freeze
click==8.1.7
decorator==5.1.1
Flask==2.2.2
importlib-metadata==6.8.0
isodate==0.6.1
itsdangerous==2.1.2
Jinja2==3.1.2
jsonpath-ng==1.5.3
lark-parser==0.12.0
MarkupSafe==2.1.3
numpy==1.23.4
pandas==1.5.1
ply==3.11
pyparsing==3.1.1
pyrml==0.3.0
python-dateutil==2.8.2
python-slugify==7.0.0
pytz==2023.3.post1
rdflib==6.2.0
shortuuid==1.0.9
six==1.16.0
SPARQLWrapper==2.0.0
text-unidecode==1.3
Unidecode==1.3.7
werkzeug==3.0.1
zipp==3.17.0
Did you manage to replicate this @anuzzolese ?
Currently the input file can't be parameterised via cli or api. It is hardcoded into the mapping file. Eg:
It would be more flexible to be able to provide this as a parameter.