cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
91 stars 23 forks source link

Add ability to import and export data in container object format #399

Closed pkalita-lbl closed 1 year ago

pkalita-lbl commented 1 year ago

This resolves #390.

Currently, the getDataObjects and loadDataObjects methods provide and accept arrays of objects. For various reasons, when working in the LinkML ecosystem we often want those arrays to be wrapped in a top-level or "container" object. That is:

{
  "items": [
    . . .
  ]
}

As opposed to:

[
  . . .
]

In the above example, items would be referred to as the "index slot" of the container object. In many cases the name of the index slot can be inferred.

These changes allow getDataObjects and loadDataObjects to work with both arrays (as they currently do) and arrays wrapped in a container object.

On the exporting side, I've added some interface elements to the Toolbar's "Save As" menu which allows you to choose whether you want to save as an array or an object and, for objects, provide an index slot name (the inferred one will be populated if possible). These values are then passed to new options on the getDataObjects method.

On the loading side, loadDataObjects now detects if an object is provided and if so attempts pick the actual grid data out of the right key.

cc: @turbomam

ddooley commented 1 year ago

I've approved. It would be good to have a paragraph or two to add to the docs to describe the behaviour we recommend for this visa vis https://w3id.org/linkml/tree_root . I.e. shall we get in the habit of having people add to their schema tree_root="true" for a given class.

I was wondering as well, in case where dataharmonizer finds multiple classes with dh_interface=true, that each one is a candidate for parsing a json data file in a given template. I.e. multiple "tree_roots" actually used from a given schema. So next iteration of this could simply have a data file indicate by metadata which schema and version and "tree root" class it belongs to.

pkalita-lbl commented 1 year ago

Good points. I'll make a few issues to track that later today.

pkalita-lbl commented 1 year ago

Well, I fully intended to make some issues, but the more I thought about it the more I was unsure about what issues to make 😁. I think maybe a meeting sometime in the near future would be more productive. I think there are some requirements to be laid out and design decisions to be made, and I want to be sure they're not being made in isolation.

ddooley commented 1 year ago

K, I'll touch base in Slack on this!