cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
90 stars 23 forks source link

Add method for getting data as array of objects #330

Closed pkalita-lbl closed 1 year ago

pkalita-lbl commented 2 years ago

Currently a DataHarmonizer object has a getTrimmedData() method that returns an array of arrays. For some applications it would be convenient to get an array of objects instead. This could be implemented in a new method (getDataObjects() or something like that) in order to not affect the current API (although we could also consider renaming the existing method to make it more clear about what it returns if compatibility isn't a huge concern).

If the spreadsheet contained something like this:

first second third
a b c
d e

I would expect this method to return:

[
  {
    "first": "a",
    "second": "b",
    "third": "c"
  },
  {
    "first": "d",
    "third": "e"
  }
]

A client could write their own method to transform the output of getTrimmedData() to produce that, but it seems like it would be common enough to add to DataHarmonizer.

pkalita-lbl commented 2 years ago

@ddooley this came up in the context of using DataHarmonizer for a new project. Feel free to assign to me!

ddooley commented 2 years ago

This sounds like it aligns perfectly with being able to load and save a sparse JSON data format as you suggest. So happy to assign you to this one!

I believe the array of arrays was just used for Handsontable operations or loading/saving tsv and csv data?

Cheers,

pkalita-lbl commented 2 years ago

Yeah, Handsontable provides an array of arrays as the output of its getData() method and DataHarmonizer. getTrimmedData() is a thin wrapper around that. That format is definitely useful for producing TSV/CSV files. An array of objects format will be useful for working with JSON Schema validators and migrating older data to newer schema versions.