Vocabularies validation and table creation

rpwagner commented 4 years ago

The JSON schema used for ingest is being updated to simplify the data provided. For internal vocabularies, (subject_role_taxonomy.role_id and subject.granularity) these will be represented as enumerations in the schema.

The external vocabularies, (biosample.anatomy, biosample.anatomy.assay_type, file.file_format, and file.data_type) will stop having foreign keys. The tables will be created using code to collect the descriptive information.

u8sand commented 4 years ago

@rpwagner The way I'm currently doing it: https://github.com/nih-cfde/FAIR/blob/master/Demos/FrictionlessDataclass/c2m2_tools.py which basically just downloads the ontology tables and runs build_term_tables from https://github.com/nih-cfde/cfde-deriva/tree/master/extractors_and_metadata.HMP.Level_1

In practice: https://github.com/nih-cfde/FAIR/blob/master/LINCS/c2m2/scripts/convert_to_c2m2.py#L522-L525

...
  pkg = create_datapackage('C2M2_Level_1', convert_lincs_to_c2m2(), outdir)
  build_internal_CV('C2M2_Level_1', outdir)
  build_term_tables(outdir)
  validate_datapackage(pkg)
...

With all the data, it'd just be a matter of running build_internal_CV('C2M2_Level_1', outdir) again with the integrated datapackage.

jgaff commented 4 years ago

Is this part of the derived table creation transformations? (https://github.com/nih-cfde/cfde-deriva/issues/47) Or in what part of ingest is the work needed? The Action Provider is schema-agnostic and does not have any schema-specific code right now; I'm a little confused as to what changes are being requested.

rpwagner commented 4 years ago

Based on @karlcz's comments, this may be better in the client. The client can create the vocabularies and send them to the action provider.

fair-research / deriva-action-provider

Vocabularies validation and table creation #8