bmeg / iceberg-schema-tools

Create and maintain central iceberg schema.
2 stars 0 forks source link

Feature/data pfb #7

Closed bwalsh closed 1 year ago

bwalsh commented 1 year ago

This PR adds a data section:

$ iceberg schema
Usage: iceberg schema [OPTIONS] COMMAND [ARGS]...

  Manage bmeg or gen3 schemas from FHIR resources.

Options:
  --help  Show this message and exit.

Commands:
  generate  Generate from FHIR resources.
  compile   Create aggregated json file from individual yaml schemas
  publish   Copy dictionary to s3 (note:aws cli dependency)

$ iceberg data

Usage: iceberg data [OPTIONS] COMMAND [ARGS]...

  Project data (ResearchStudy, ResearchSubjects, Patient, etc.).

Options:
  --help  Show this message and exit.

Commands:
  simplify       Renders Gen3 friendly flattened records.
  validate       Check FHIR data for validity and ACED conventions.
  validate-gen3  Check Gen3 data for validity and ACED conventions.
  pfb            Write simplified FHIR files to a PFB.
  migrate        Migrate from FHIR R4B to R5.0.
bwalsh commented 1 year ago

@kellrott @lbeckman314 Adding you as additional reviewers on this PR as it ties together the pfb_fhir paper and the aced work

matthewpeterkort commented 1 year ago

CONTRIBUTING.md Link is broken on the PYPI website but works on the github README.md page.

pip installed the package and tested with local fhir files I had translated from OMOP:

the validate and simplify commands worked for me. Pfb command was erroring with the subprocess command: "pfb from -o output/obs.pfb dict iceberg/schemas/gen3/aced.json" returning a -1. This is probably something on my end. Did not try the migrate command.

Looked over the code files/test coverage but did not have any comments.

bwalsh commented 1 year ago

Note: I will squash commits and merge when review is complete.

bwalsh commented 1 year ago

TODO: - Add "pluck" capability for embedded objects

TODO: - abstract references:

done see config.nested_objects

bwalsh commented 1 year ago

TODO: - add documentation for

json schema vocabulary

done on feature/hypermedia branch

bwalsh commented 1 year ago

TODO: - add example of linking to BMEG vertices Allele GeneEffect

done on feature/hypermedia branch

matthewpeterkort commented 1 year ago

Had some issues running the commands listed under "example" in the README docs: It should be made clear in the README that you have to run a:

pfb_fhir schema generate simplified  

to generate the simplified schema needed to run the pfb command. Works after running:

pfb_fhir data simplify --schema_path  iceberg/schemas/simplified/simplified-fhir.json tests/fixtures/simplify/study/ tmp/simplified

instead of:

pfb_fhir data simplify  tests/fixtures/simplify/study/ tmp/simplified

For some reason this flag needed to be explicitly stated. and checking out the iceberg schema repository to fhir-pfb branch

Also iceberg_tools/cli/data.py:L104-106 appears to be throwing incorrect asserts. After commenting them out the program worked fine without errors and generated a 31 MB pfb file. with:

pfb_fhir data pfb tmp/simplified/ tmp/study.pfb