bmeg / iceberg-schema-tools

Create and maintain central iceberg schema.
2 stars 0 forks source link

Simplifier: GRIP dialect #25

Open bwalsh opened 1 year ago

bwalsh commented 1 year ago

Please write up a use case that describes what this is and when / why to use it.

Use case

As a [role], when [this happens] in order to [describe outcome], the system needs to [feature description]

Implementation details

matthewpeterkort commented 1 year ago

As a ACED IDP platform developer, in order to import edges and vertices into GRIP, source data files that adhere to the Iceberg schema must be able to be translated into the GRIP format.

For example GRIP vertex has keys {"label": str, "id": uuid , "data": dict} and edge has keys :{"label": str, "to": uuid, "from": uuid}.

The edge/vertex generation command looks for a directory and generates the edges/vertices that are applicable from the ndjson iceberg files located in the directory. It assumes that objects follow the schema provided using the --schema_path option and if they don't errors will occur.

iceberg data simplify tests/fixtures/simplify/synthea newobsss --schema_path https://raw.githubusercontent.com/bmeg/iceberg/main/schemas/graph/graph-fhir.json --dialect GRIP

Edge seems too slow to be a practical solution at scale. Need to think about how to make this faster.

Had to remove observation.ndjson because it was taking too much time to generate edges and DocumentReference.ndjson was causing schema errors.

bwalsh commented 1 year ago

Thanks. Let's talk about it on Monday

On Fri, Nov 10, 2023, 11:48 AM matthewpeterkort @.***> wrote:

As a ACED IDP platform developer, in order to import edges and vertices into GRIP, source data files that adhere to the Iceberg schema must be able to be generated into the GRIP format.

For example vertex has keys {"label": str, "id": uuid , "data": dict} and edge has keys :{"label": str, "to": uuid, "from": uuid}.

To invoke these commands run iceberg data simplify tests/fixtures/simplify/synthea newobsss --schema_path https://raw.githubusercontent.com/bmeg/iceberg/main/schemas/graph/graph-fhir.json --dialect GRIP

Edge seems too slow to be a practical solution at scale. Need to think about how to make this faster.

Had to remove observation.ndjson because it was taking too much time to generate edges and DocumentReference.ndjson was causing schema errors.

— Reply to this email directly, view it on GitHub https://github.com/bmeg/iceberg-schema-tools/issues/25#issuecomment-1806072289, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALVQEUSVI57FW2KSBXFLTYDZLFXAVCNFSM6AAAAAA642Y7POVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWGA3TEMRYHE . You are receiving this because you authored the thread.Message ID: @.***>