Closed Ben-Goethuys closed 2 months ago
Ben, all the tables that have event columns, are first stored in the WORK zone. After all the other tables are done, then the second stage of the ETL, will fill in the event colums (based on the _swap tables in the work zone), while copying the date to the OMOP zone. This will always be a full table scan. So adding indexes won't improve performance.
A performance improvement can be, to split up the records that have a event record filled in. So the event column is not filled in, send that record directly to the OMOP zone, other wise send it to the WORK zone. This will minimize the copy around of data, but will break the atomic nature of the ETL. So there are trade offs.
The "work"-zone tables corresponding to the main omop tables like for example [work].[vocabulary] don't have indices. Other "work"-zone tables do have indices.
Could indices on these tables help with the SQL performance?
Side note: Azure SQL could benefit from the compression of columnstore indices in batch loads