Atomic Data for vocabularies

fosterlynn commented 2 years ago

In discussion with @joepio about how Valueflows might make use of this very interesting application, some things came up. Valueflows would be using it as a vocabulary/ontology, so more for meta-data than data.

Valueflows currently has a turtle file as its "system of record" / "source of truth" for the vocabulary. It has needs to provide the vocabulary in various formats, both rdf-based and non-rdf-based. Currently, that requires a lot of manual work and is error-prone.

With apologies for not knowing enough about Atomic Data yet, features that we would use in this situation:

ability to import text rdf files into Atomic Data, in our case turtle (there would be a lot to think about to consider Atomic Data as our system of record or even just where we maintain the vocabulary as a practical matter - for example where it would fit into our current git-based workflow)
ability to export to other formats, like the ones mentioned in the venn diagram here: SHACL, SHeX, JSON-LD, JSON Schema; and also turtle, if we used Atomic Data to maintain the vocabulary
ability to keep elements not part of the Valueflows namespace, but which are needed for the model to function; and the ability to tell the difference

But, almost everything we would do would be about the vocabulary itself, not actual data that uses the vocabulary - so is a (fundamentally?) different use case. Although, we do have a little bit of "hard-coded" data that we consider part of the vocabulary.

Note: These are thoughts as input to the project, not a request to do any work. :) But if you do decide to do some work, we can talk through this more, it is atm very hand-wavy, without enough knowledge of Atomic Data, so at least would need a lot more detail, and might not even make sense.

hoijui commented 2 years ago

I am strongly connected with VF, and we have pretty much the same use-case, issues and needs with Open Know-How (OKH): https://github.com/OPEN-NEXT/OKH-LOSH/blob/master/OTRL.ttl

For us too, exporting to RDF and importing from RDF would be paramount. You said, importing from RDF is very hard. I do not know too much about RDF and only have ~2 years spourious practical experience, and not too practical either. I do not understand as much of the issues as you see them for this importing from RDF, but I know you are right. ;-) I imagine a CLI tool for importing automatically, which spits outs errors and warnings, about things in the RDF ontology that are not compatible with AD, kind of like a compiler. If I had this, I would try to fix our RDF ontology manually until there are no errors anymore, and give whatever additional info required (like data-types) in some way (preferable directly in the ontology).

I imagine, that exporting to RDF (which you said is easier) could be done in such a way, that re-importing that exported RDF back into AtomicData, would work perfectly (no errors or warnings). So it would for example contain all the data-type info that AD requires, which is of little use in RDF, but helps for re-importing. If I do not want that data in my RDF, leaving it out should be trivial. I think, having full import & export support for RDF, would be the single best thing for spreading adoption of AD.

Projects like VF or OKH would have to invest some time initially, to get their RDF into a state where import succeeds, but after that there are only pros, and one would also learn how to structure data models more cleanly, in the process of doing so. ... I would actually really like to go through such a process, and Lynn also said she does not mind to do manual work.

I see no reason for OKH to not use AD for storage as well, once we have import and export (and thus no risk for lock-in), because we kind of host and control our data-storage and all our tech.

joepio commented 2 years ago

Thanks for taking the time to share your wishes and considerations, @fosterlynn and @hoijui!

Exporting Atomic Schema to RDF (JSON-LD, Turtle, etc)

This is already possible in Atomic-Server! It has great JSON-LD serialization, and it supports various other RDF formats. But there is no mapping to OWL / RDFS / SHACL / SHEX or something like that.

Importing RDF ontologies, converting to Atomic Schema

This will be pretty hard to do. It is certainly possible, to some degree, but it is error-prone and will probably be pretty complicated. Importing RDF and converting it to a domain specific in-memory model is complicated on its own, but what makes it really complicated, is the low degree of consistency between RDF ontologies, regarding their schematic descriptions. Some use RDFS, some OWL, some SHACL, some SHEX... I think it could be done, but I don't like doing it. I'd definitely want to help out anyone who's willing to try, though!

Exporting Atomic Schema to RDF ontologies / SHACL / JSON Schema

This is probably far easier. I've done it for typescript, and think it would not be that hard to do so for RDFS, SHACL, SHeX and perhaps other schema systems. I don't think I'd want to support OWL, as that would incorrectly give people the impression that it's a schema language - it's not, it should not be used for data validation or shape constraints!

Relevant:

atomicdata-dev / atomic-data-browser