Open MBueschelberger opened 2 days ago
Coverage Report
File Stmts Miss Cover Missing data2rdf __init__.py 5 0 100% config.py 19 0 100% utils.py 33 5 5 85% warnings.py 2 0 100% data2rdf/models __init__.py 3 0 100% base.py 47 4 4 91% graph.py 150 35 35 77% mapping.py 40 1 1 98% data2rdf/modes __init__.py 4 0 100% data2rdf/parsers __init__.py 6 0 100% base.py 134 11 11 92% csv.py 168 20 20 88% excel.py 175 17 17 90% json.py 188 29 29 85% utils.py 79 11 11 86% data2rdf/pipelines __init__.py 2 0 100% main.py 82 9 9 89% data2rdf/qudt __init__.py 0 0 100% utils.py 42 12 12 71% TOTAL 1179 154 87%
Tests | Skipped | Failures | Errors | Time |
---|---|---|---|---|
114 | 0 :zzz: | 0 :x: | 0 :fire: | 2m 56s :stopwatch: |
Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?
Looks good to me. Does the changes also distinguishes if the object is going to be a literal or a URIREF object? For example if the data has an attribute hasOrganization and the value will be an IRI of a kitem.
Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?
It is also supported for Excel. However, the wildcard through source
is not working there, since you cannot apply jsonpath to excel.
Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.
The old schema is still supported. The only difference is that if custom_relations
is set, the other fields like value_location
and value_relation
, unit_location
and unit_relation
are disabled.
Looks good to me. Does the changes also distinguishes if the object is going to be a literal or a URIREF object? For example if the data has an attribute hasOrganization and the value will be an IRI of a kitem.
As already mentioned in the attached link to the docs above, you are able to set the xsd-type with the object_data_type
field:
...
{
"object_location": "lab_no",
"relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
"object_data_type": "anyUri",
},
...
Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?
It is also supported for Excel. However, the wildcard through
source
is not working there, since you cannot apply jsonpath to excel.Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.
The old schema is still supported. The only difference is that if
custom_relations
is set, the other fields likevalue_location
andvalue_relation
,unit_location
andunit_relation
are disabled.
Is it then the case that data2rdf throws an error or a warning when a user tries it in a way that is not supported?
Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?
It is also supported for Excel. However, the wildcard through
source
is not working there, since you cannot apply jsonpath to excel. Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported. The old schema is still supported. The only difference is that ifcustom_relations
is set, the other fields likevalue_location
andvalue_relation
,unit_location
andunit_relation
are disabled.Is it then the case that data2rdf throws an error or a warning when a user tries it in a way that is not supported?
Yes it does!
Previously, the mapping schema for individuals with custom relations was not very effective and very repetitive if an individual needs e.g. multiple dataproperties from a data file.
In order to produce a graph like this...
... mapping like this would have been needed to be applied:
... on a dataset shaped like this:
However, with this PR, the schema can now be more simplified:
Please note that the dataset now can have as many individuals as needed since we are able to apply a wildcard now (
data[*]
). Thesuffix
of the individual is also retrieved from the dataset oncesuffix_from_location
is set toTrue
. If set toFalse
, simply the provided value from thesuffix
key will be taken.If
source
is set, theobject_location
will be treated as a relative path of the root objects iterated from thedata[*]
.If
source
is not set, theobject_location
will be treated as absolute path. Same also applies for thesuffix
, whensuffix_from_location
is set toTrue
.See the updated docs here: https://github.com/MI-FraunhoferIWM/data2rdf/blob/enh/mapping-for-multiple-individuals/docs/examples/abox/6_custom_relations.md