Closed turbomam closed 1 year ago
Looking back on this, I think DH should input/output LinkML (JSON-LD) native JSON directly via browser, so need to understand the javascript required to do so. The existing "file -> Save as > .json" could be renamed to "file -> Save as > flat .json", and we could add a "file -> Save as > LinkML .json" option for the pure version. This avoids us having to use command line python tools as intermediary step.
@pkalita-lbl for comment.
(The LinkML data inlining options will come into play here later when we add 1-many data relations.)
Let me see if I understand Mark's concern correctly. If I have a schema that implement's the typical LinkML container object pattern:
id: http://example.org/test
name: test
imports:
- linkml:types
prefixes:
linkml: https://w3id.org/linkml/
slots:
s1:
range: string
s2:
range: string
entries:
range: Entry
multivalued: true
classes:
Entry:
slots:
- s1
- s2
EntrySet:
tree_root: true
slots:
- entries
I could point DataHarmonizer to the Entry
class and it would show me an interface with two columns (for s1
and s2
). I could enter some data and then export that data to JSON through the interface. It would look something like:
[
{
"s1": "row 1 col 1",
"s2": "row 1 col 2",
},
{
"s1": "row 2 col 1",
"s2": "row 2 col 2",
}
]
The issue is that I can't validate that file as-is using linkml-validate
or using a generic JSON Schema validator and the JSON Schema derived from the LinkML schema. That's because LinkML doesn't really have a concept of an array at the root level -- hence the container object pattern.
So what Mark is saying is that if DataHarmonizer could somehow produce JSON that instead looks like:
{
"entries": [
{
"s1": "row 1 col 1",
"s2": "row 1 col 2",
},
{
"s1": "row 2 col 1",
"s2": "row 2 col 2",
}
]
}
Now we have an object at the root level. That object corresponds to the EntrySet
class in the schema and could be validated as such.
I don't have an exact proposal for how to resolve the situation, but it will probably involve a combination of logic to guess at the so-called container class and index slot (presumably via teaching DataHarmonizer to understand the tree_root
metaslot), as well as ways to specify them manually (see also: https://linkml.io/linkml/data/csvs.html).
DH is welcome to add the DH JSON -> LinkML JSON (and vice versa) converters that I wrote
see