Open joepio opened 2 years ago
So as it would optimally work, in my head, is like this:
somewhere under an AD sub-domain, there are AD proxies for .. the 100 most common RDF/OWL ontologies out there. These would have to be auto-generated as much as possible, and the rest should be done in a semi-automated way, for example:
https://github.com/schemaorg/schemaorg/blob/main/data/schema.ttl
would be fed into a script (rdf2ad
), together with an other file, which contains a list of propertyName -> dataType
mappings. if any propertyType
is missing in that mapping, rdf2ad
will print an error message and exit 1
. Then that mapping has to be added manually. Doing it this way, we need to do relatively little manual work, and yet can still deal with changes/different versions in the RDF ontologies pretty well.
... after converting the RDF ontology to AD, it will be hosted under a URL resembling the original URL, for example:
https://github.com/schemaorg/schemaorg/blob/main/data/schema.ttl
-- converts to -->
# using the source-file URL:
https://rdf-mirror.atomicdata.dev/ontologies/github.com/schemaorg/schemaorg/blob/main/data/schema.ttl.ad
# or the original schema IRI (makes more sense, I think -> easier conversion)
https://rdf-mirror.atomicdata.dev/ontologies/schema.org.ad
To start brainstorming ideas for how to practically go about getting there, I will outline the roadmap I have in my head right now:
Write a script (BASH/Python/Rust?) that creates a table of the most commonly use RDF/OWL(2) ontologies, including 1 line for released version of each of these ontologies, each line containing at least: IRI, version, raw-data-download-URL
Write a script that syncs the raw-data-download-URL's to the local file-system.
Write a script that collects statistical data over all these ontologies, e.g.:
Start writing a tool (Rust) that converts an RDF/OWL(2) ontology into an AtomicData one. At first, it will only contain classes, properties and their connection.
Test the tool in that state on all the ontologies.
Write a tool/script to convert a "user" data-set (i.e. the OKH-LOSH data) to AtomicData, in a very much simplified form.
... and back. -> PoC done!
Improve the tools from steps 4, 6 and 7.
Some ideas on how to tackle nr. 4 (assuming you're using schema.org as ttl source, and prefer writing stuff in Rust). So consider these following as substeps.
JSON-AD
strings, but perhaps better is to use the atomic_lib::Resource
struct with the .set_propval
+ save
methods. See example.rs
or browse through the tests for inspiration.URLS
to something like atomicdata.dev/ontologies/schema/something/ID
JSON-AD
file with these ontologies.I did some initial research for lists of ontologies, and there seem to be some good options! :-)
Perl-based RDF libraries commonly use one of these methods for stable references to ontologies: a) a dump of http://prefix.cc/popular/all at a fixed point in time b) a manually curated subset of a)
Since it sounds like you want to restrict by certain qualities (e.g. "OWL-based", or "reasonably popular"), I suggest that you do b) - and then if it turns out that prefix.cc does not cover some ontologies you fancy, then there is nothing stopping you from changing the rules of your curating to include non-prefix.cc ontologies (but also you might then consider simply registering your pet ontologies at prefix.cc and just bump your fixed fime to a moment after your registration)
All of prefix.cc is currently ~3000 ontologies.
The perl module RDF::NS::Curated provides a curated set of ~65 ontologies (as I recall it is simply "the most popular at prefix.cc at the time" but if curious you/I can simply ask Kjetil).
ohhh perfect, thank you Jonas! :-) Sounds like I'd try that ... maybe those same ~65 then.
I applied for funding for doing this on my own about a year ago with NLnet and got refused somewhen in Q1 this year. Since then, no new attempts from my side.
I still very much would like to have this mapping capability. Right now, I am starting a new project with Lynn from VF, creating an ontology for OSH. I would love to do it in AD instead of RDF, but can;t because of this missing.
@joepio Do you have an idea for how to write an RDF ontology so that it would be easily and future-fully mappable to AD, once this mapping is implemented?
Thinking of: things to avoid using in RDF, and maybe extra properties to favor using or to be used on all classes/properties in the ontology . I guess the main thing would be the validation/data-type part.. right? Asking of course, so we could take it into consideration, now that we start writing our ontology. (We already did start, but it is still very small and completely mold-able.) Also good to know, to get to a point where we have a few such AD-mapping-ready RDF ontologies at some point, ready to test a mapping implementation, nice development starts on it. In the best case, such extra properties for AD would even be usable/make sense even disregarding AD, but that is less important.
Good question @hoijui!
I think most RDF ontologies / shacl shapes should be mappable to Atomic Data.
Some things to keep in mind:
what about the data validation... would AD data validation map to SHACL, or an RDF property specially made for this (e.g. admapping:datatype
)?
(something went wrong with the link in your comment)
This gives some hints, I guess: https://docs.atomicdata.dev/interoperability/rdf.html?highlight=language%20tags#convert-atomic-data-to-rdf
So language tags working differently... is that really an issue when they are used, when there is software (the code doing the mapping) in-between? I am not talking about making RDF valid AD, just of it having the necessary data to map it to AD.
Fixed the link!
Yeah gotcha.
I think we can probably map pretty much everything at some point, like, we can always fall back to the 'string' datatype.
ok.. I guess.. I'll not do anything special for now then. thanks!
gotcha :)
Existing RDF ontologies have some problems that Atomic Data solves:
Read more about atomic & rdf.
But there are many ontologies in existence, and these describe various domains quite accurately. It would be great if we could still get the benefits of atomic data, without losing the information stored in these existing ontologies.
Some thoughts / challenges:
subClassOf
ordistinctFrom
.Implementation
Add
original-url
property to Property classThis
original-url
would be the URL of the RDF predicate. When serializing to RDF, we could opt in to use this URL. Inversely, when importing RDF, we could search for Properties having that predicate as original URL, and conform to the atomic data constraints (namely, they must resolve to JSON-AD properties)However, this would come with a challenge. If a server has multiple
Properties
with the sameoriginal-url
value, the server can't decide which one should be used. Malicious agents might even inject resources in the Server to mess up mappings.If we have an explicit mapping resource, we can prevent this.
Mapping resource
A resource that contains a bunch of mappings. This can be referred to while importing RDF.
see lenses #102
Credits to @hoijui for sharing many ideas on this topic