glossarist / glossarist-ruby

Concept modeller in Ruby
BSD 2-Clause "Simplified" License
1 stars 1 forks source link

(URGENT) Perform strict validation and raise errors if Concept YAML files have a wrong structure #71

Closed ronaldtse closed 1 year ago

ronaldtse commented 1 year ago

Write out a JSON Schema for the Concept YAML file and validate it before reading it. This is causing silent crashes in downstream applications, such as the metanorma glossarist plugin.

Originally from:

HassanAkbar commented 1 year ago

@ronaldtse I’m a little confused here do we need to define something like below and validate according to it when reading the concept file or do we need to add validation for datatype in each model separately?

{
  "termed": "String",
  "term": "String",
  "language": {
  "terms": [
    "type": "String",
    "normative_status": "String"
    "designation": "String"
    ]
    "definition": [
      "content": "String",
    ],
    "examples": [
      "content": "String",
    ],
    "notes": [
      "content": "String",
    ],
    "language_code": "<three_digint_language_code>",
    "entry_status": "valid"
    "sources": [
      "type": "String",
      "origin": {
        "ref": "String",
        "clause": "String",
        "link": "String",
      }
    ]
  }
}
ronaldtse commented 1 year ago

There are few layers of validation, so structural validation is definitely the basic type of validation.

normative_status, type and entry_status are not just normal strings, but they accept only certain values. We will need to check for them in the model.

HassanAkbar commented 1 year ago

@ronaldtse every key in this config file is already being validated for the allowed values here. So we only need to add structural validation in this ticket. Right?

ronaldtse commented 1 year ago

@HassanAkbar technically yes. However, why does Glossarist not raise any errors when reading the Concept YAML files that are clearly violating the data definition? I want to see errors raised when a bad file is read.

HassanAkbar commented 1 year ago

@ronaldtse I've tested the glossarist gem and it is raising errors for the following

What other errors do we need to handle? Can you provide an example of an invalid file that was not raising errors?

ronaldtse commented 1 year ago

Can you raise errors / validate on the concept YAML files in these commits?

Thanks.

HassanAkbar commented 1 year ago

@ronaldtse I tried running Glossarist on the given commits and it raises the following error if the yaml file is not properly formatted.

Psych::SyntaxError ((<unknown>): could not find expected ':' while scanning a simple key at line 17 column 3)

Do we need to reword this so that it is better to understand?

Also by looking at the fix here the spaces in groups name are replaced by dashes do we need to add a validation for that as well?

The error is not raised for the files if they contains lines starting with # because they are considered comments and are ignored when parsing.

ronaldtse commented 1 year ago

@HassanAkbar for this error how come it's not causing the metanorma-glossarist-plugin to crash?

Can we override Psych error with a Glossarist error, e.g. Glossarist::Error::ParseError?

spaces in groups name are replaced by dashes

Probably not, the groups could be strings with spaces I guess?

HassanAkbar commented 1 year ago

@HassanAkbar for this error how come it's not causing the metanorma-glossarist-plugin to crash?

@ronaldtse In metanorma-plugin-glossarist we are reading the concepts inside the render method which is being called by the liquid and all the errors are caught by liquid and rendered as Liquid error: internal.

It does set the errors in the template variable that we can check after rendering is done and can raise the error again. I'm making that change as well.