BU-ISCIII / relecov-tools

set of helper tools for the assembly of the different elements in the RELECOV platform (Spanish Network for genomic surveillance of SARS-Cov-2) as data download, processing, validation and upload to public databases, as well as analysis runs and database storage.
GNU General Public License v3.0
4 stars 19 forks source link

New module `build_schema` #273

Closed Daniel-VM closed 3 weeks ago

Daniel-VM commented 3 months ago

PR Description

This pull request introduces a new module, build_schema.py, designed to parse a database definition in XLSX format and convert it into two files simultaneously: relecov schema definition (relecov_tools/schema/relecov_schema.json) and Metadata-Lab Template (XLSX file template for entering lab metadata).

Additionally, this module will generate and save a report that compares the updated version of relecov_schema.json with its previous version after each execution of the module, while also tracking versioning.

Major features and Implementations::

  1. Schema Draft Template Creation: Loads a JSON schema draft template based on the specified version.
  2. Reads database definition (xlsx) and creates a new JSON schema: Constructs a new JSON schema based on the database definitions and the draft template. It is able to handle various schema properties discriminating between 'standard' properties and 'complex properties'.
  3. Validation of the new schema: Ensures that the new schema follows JSON schema definition based on version draft.
  4. Create new Schema and prints diff: Compares and prints differences between the base schema and the new schema.
  5. Create an excel template based on the New JSON schema: Creates the metadata lab template in xlsx format.

Utility functions

The folder assets/schema_utils was created to store utility functions related to the module.

Closes #259

saramonzon commented 1 month ago

Some minor stuff:

Something I do not understand: what does this mean in the standard output? It's not very informative

Schema validation error: '' is not valid under any of the given schemas

Another thing to take into account there is no log outputted when using the --log-file core functionality, this can be a follow up PR but make and issue because it's important to save log information about the build schema process.

but it looks really nice @Daniel-VM !! And the testing data you provided really helpful for testing and checking everything out, thank you so much!

saramonzon commented 1 month ago

Also, Maybe we have some code overlap between json_validation and json_draft? Both have the functionality of validate the schema, but it's true that the json_Validation also validates the data and creates de metadata excel and such. I'm thinking we may have that separated but call the json_draft from the json validation, so we don't have duplicated code/functionality?

Daniel-VM commented 3 weeks ago

@saramonzon @Shettland