OHDSI / CohortDiagnostics

An R package for performing various cohort diagnostics.
https://ohdsi.github.io/CohortDiagnostics
40 stars 45 forks source link

Investigate using apache arrow in place of sqlite #984

Closed azimov closed 1 year ago

azimov commented 1 year ago

Using sqlite has allowed an improvement for our shiny apps in terms of allowing a standard set of queries between both postges and local backends. However, a recurring problem is the relatively large size of the sqlite data sets due to indexes that need to be built as table references.

Andromeda is converting from sqlite to arrow as a new backend, this has resulted in significant performance increases. In addition, the base storage can simply be as a CSV model that can be queried through an SQL interface.

Investigating Arrow should be based around the following principles:

azimov commented 1 year ago

After evaluation, there are limitations with arrow when used as an SQL backend that won't let us easily update the schema easily. Pursuing an alternative solution changing database ids to integers.