OHDSI / Andromeda

AsynchroNous Disk-based Representation of MassivE DAta: An R package aimed at replacing ff for storing large data objects.
https://ohdsi.github.io/Andromeda/
11 stars 9 forks source link

Andromeda v1.0 canidate #29

Closed ablack3 closed 2 years ago

ablack3 commented 2 years ago

This possible Andromeda v1.0 candidate branch implements a couple significant changes.

First, it swaps out the duckdb database (https://duckdb.org/) for Sqlite. duckdb support date types which is a persistent issue with Andromeda that seems difficult to fix since sqlite does not have a date type. Additionally duckdb is a lightweight database like Sqlite that is designed for analytics and has integration with parquet and arrow.

Second, this candidate release changes the Andromeda object. Currently Andromeda is an S4 object that extends SqliteConnection. This candidate would make Andromeda extend dm instead. dm is an R package that is very similar to Andromeda in purpose and API. The major difference is that dm objects can hold references to any database tables not just local Sqlite database tables. Andromeda fits nicely as a special case of dm in my opinion.

Breaking changes I think it is possible to make these changes without breaking the Andromeda API. All current function would still work as expected. However the process for extending Andromeda which would need to change since the new Andromeda would use S3 instead of S4.