asw-v4 / cw-24-hackday-violet

An interactive Health Check of GitHub Repositories.
MIT License
3 stars 1 forks source link

Define output schema to provide the data to visualise #5

Open sadielbartholomew opened 2 months ago

sadielbartholomew commented 2 months ago

The data we pull from GitHub and FAIR will be used to create JSON to capture relevant information should then be consolidated into a final output data schema which will provide all of the information needed by the visualisation tool/script to display and represent.

sadielbartholomew commented 2 months ago

I am thinking of this format for the JSON schema (written and validated as valid JSON with https://jsoneditoronline.org), where <> markers show what information is intended, with the data type first then a description:

{
  "metadata": "<string, timestamp (standard format e.g. ISO8601)>",
  "repository": {
    "display name": "<string, name for vis. label, e.g. GH repo name>",
    "URL": "<string, GH repo link>"
  },
  "status": {
    "<numeric metric name>": {
      "valid range": ["<minimum>", "<maximum>"],
      "closed interval": "<Bool, True if range is closed, False if open or half open>",
      "direction of health": "<Bool, True means increasing is better i.e. maximum is healthiest>"
    },
    "<Boolean metric name>": "<Bool, corresponds to value that is healthy>"
  }
}

In short, we have some metadata to record when the information was collected, then basic info about the repo in question, then finally a dictionary recording all of the metrics, which includes the important context of which values are possible and which values represent the 'healthiest' state:

JimCircadian commented 2 months ago

Drafted a basic schema, but need to converge on this layout in time!