WNPRC-EHR-Services / wnprc-modules

Custom LabKey modules developed by WNPRC
Apache License 2.0
1 stars 2 forks source link

ETL Questions #196

Open bbimber opened 3 years ago

bbimber commented 3 years ago

@dnicolalde Some comments:

bbimber commented 3 years ago

Also, you can omit investigatorName from assignments. As a general rule I'd prefer not to host investigator names.

bbimber commented 3 years ago

For deletes:

1) Make a hard table on resehr (you largely have this) 2) Make a query to pull from the EHR audit log and populate into this table. filter on comment = 'A record was deleted', etc. 3) this should store the schema, query, source modified, source created, and objectId. 4) For the objectId: you might want to make a ColumnTransform that parses the objectId from the LSID (key) column

This ETL should be the first step in the resehr ETL. Each subsequent step in the ETL (per dataset), should use that local table as the delete source. I think this should work robustly.

bbimber commented 3 years ago

One other comment: you have at time mentioned universal IDs and/or other identifiers. What kinds of alternate IDs exist in your DB, and can these be in the ETL?

dnicolalde commented 3 years ago

@bbimber I added the following filter for the assignments:

"id/demographics/species/scientific_name" value="Callithrix jacchus"

using the scientific name, so that it will be more generic between Wisconsin and Southwest if they decide to use the same code or should I just keep the common name.

For birth and deaths, would it make sense to add this data points to the demographics table and maintain one single table with all the information you would need. The demographic table we would share with you will only have data that does not change over time generally. For arrivals and departure we would share the those tables filter for marmoset.

I'll create a separate ETL for the supplemental pedigree and can filter that table only to share marmoset data with you. This will be a separate dataset you will have access to.

We answer your last question during the call.

dnicolalde commented 3 years ago

I remove the investigator from the ETL. I just testing to make sure it is working. Also I am going to setup the schedule to ran at 12:30 AM, so it does not interfere with other jobs that ran at night.

bbimber commented 3 years ago

OK, thanks. How are things looking for the other datasets, and tracking source deletes from the audit trail? if there are ways i can help on this let me know.