arkhn / fhir-river

Live ETL pipeline to standardize Health Data into FHIR.
Apache License 2.0
42 stars 4 forks source link

disable referential integrity in hapi-loader #723

Open simonvadee opened 2 years ago

simonvadee commented 2 years ago

seet hapi setting enforce_referential_integrity_on_write https://smilecdr.com/docs/configuration_categories/fhir_configuration.html

simonvadee commented 2 years ago

@tevariou are you sure we want this in every situation ?

tevariou commented 2 years ago

@simonvadee I just think it's totally impracticable during the ETL. A resource can link another of the same profile (eg. an organization linking another). So even if we manage to build a graph of the mappings (and it will be potentially cyclic), we will still encounter this issue. And that's not counting the performance issue.

DBT can test a relationship (https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests). I agree It won't solve an error in the mapping though with Pyrog. Another way would be to add a constraint in Pyrog, inputing a column in Reference.Identifier.value that is not a foreign key would be forbidden and the column has to be the pk of the linked mapping. But Elena works on a db that has relationships but no declared foreign keys (we can add this constraint in the datalake though praying it has been enforced in some way in the source db)...

simonvadee commented 2 years ago

I just think it's totally impracticable during the ETL

your points make sense, I agree. However I'm worried that resource indexing will be broken if we disable this, cf the doc: """ This property can cause confusing results for clients of the server since searches, includes, and other FHIR features may not behave as expected when referential integrity is not preserved. In particular, resource references to target resources that do not exist at the time that the source resource is created will not be indexed, even if the target resource is created later. """ we should test this don't you think ?

DBT can test a relationship

nice, if referential integrity can be tested in the datalake, we can be confident that it will be reflected into hapi-fhir database (except if our pg functions fhir_id or fhir_ref are buggy)

Another way would be to add a constraint in Pyrog, inputing a column in Reference.Identifier.value that is not a foreign key would be forbidden and the column has to be the pk of the linked mapping

yes we could but I don't think we should add more code to river right now. I we want to be sure that referential integrity is respected in hapi-fhir db, we can leave enforce_referential_integrity_on_write to true

tevariou commented 2 years ago

I just think it's totally impracticable during the ETL

your points make sense, I agree. However I'm worried that resource indexing will be broken if we disable this, cf the doc: """ This property can cause confusing results for clients of the server since searches, includes, and other FHIR features may not behave as expected when referential integrity is not preserved. In particular, resource references to target resources that do not exist at the time that the source resource is created will not be indexed, even if the target resource is created later. """ we should test this don't you think ?

yep we should and maybe we can call $reindex https://smilecdr.com/docs/configuration_categories/fhir_configuration.html#property-reindex-enabled after an etl ?

simonvadee commented 2 years ago

mhh I don't know it's not just indexing, it's about foreign key constraints. I'm not sure hapi can add foreign key constraints a-posteriori after data exists in the db. we should definitely test (or browse issue to see if someone already encountered this problem)