common-workflow-language / schema_salad

Semantic Annotations for Linked Avro Data
https://www.commonwl.org/v1.2/SchemaSalad.html
Apache License 2.0
72 stars 60 forks source link

What is Schema Salad about? #540

Open bblfish opened 2 years ago

bblfish commented 2 years ago

Hi,

I am not quite clear looking at the documentation what Schema Salad is about. My guess is that it could provide what CSV for the Web W3C standard provides for CSV files. See the examples in the csv2rdf document.

Something like that would allow BigData engineers to work with Avro binary files, but easily be able to find out what the URL for a specific relation or type is, or even how to construct the subject url if available. That would not add any complexity to the Avro encoding but would make it able to work with data from many other areas and also make it easy to find definitions by using HTTP on the relation or Class is.

Am I guessing correctly that this is the main use case for SALAD?

bblfish commented 2 years ago

There is a discussion on the semantic-web list starting from json-ld, to yaml-ld to avro-ld. "GRDDL for BigData..." https://lists.w3.org/Archives/Public/semantic-web/2022Jun/

tetron commented 2 years ago

Salad is a schema language that ties Avro schema together with linked data in order to emit an Avro schema, json-ld context, and RDFS. It also makes Avro a bit easier to use by adding inheritance and template specialization. @VladimirAlexiev calls this "polyglot modeling".

For idiosyncratic historical reasons, Salad has mostly only been used to describe schemas for JSON and YAML files, but since it is built on Avro, you could use it for Avro binary files as well.

Parts of this discussion might be helpful:

https://github.com/json-ld/yaml-ld/issues/3

bblfish commented 2 years ago

Thanks @tetron for the help.

To help me make sure I understood I developed a schema and model using Salad yaml and sent a mail there explaining how it all worked: https://lists.w3.org/Archives/Public/semantic-web/2022Jun/0011.html

I did not use inheritance, but that also looks very helpful.

So now I understand that Salad allows one to write Avro schemas in yaml and mark them up with RDF. The schema-salad-tool allows one to produce json-ld contexts that one can then add to the json representation of avro data in order to produce RDF. Because there is an isomorphism between avro-json and avro-binary one can think of the binary as also containing a json-ld context.

Before that we had a discussion on the semantic web mailing list about looking at the binary data as if it were json-ld data. Of course one may then want to just interpret the avro data directly without going through json-ld. @ericprud wrote up an initial idea here, which I need to go through too https://lists.w3.org/Archives/Public/semantic-web/2022Jun/0009.html

If I can summarise what I learnt and why it took me a bit of time to understand:

Thinking about this I was wondering how different in expressivity Avro is from Shacl or ShEx . Could one perhaps not just use Shacl directly to describe Avro binary data? What would be missing?

bblfish commented 2 years ago

Btw. I came acrross the EU FairPlus project's description of their use of Salad for writing Forms §9.4.2F Metadata profile validation in RDF, and ShEx to validate them.

rob-metalinkage commented 2 years ago

How does this relate to JSON-LD-framing ?

mr-c commented 2 years ago

How does this relate to JSON-LD-framing ?

Based upon https://json-ld.org/spec/latest/json-ld-framing/#introduction I would say: Neither schema-salad itself (nor schema-salad documents) specify deterministic layouts.

tetron commented 2 years ago

There's a defacto normalized form for a schema-salad document but if you are starting with an RDF graph there's likely to be multiple valid schema-salad json serializations (this depends on your schema) and schema salad doesn't have features to say which one is the preferred one.