RMM is an R package designed to handle common ohdsi results data management functions by providing a common API for data model migrations and definitions
Currently, this package only directly supports uploads of files from a directory structure.
However, this is limiting for many projects because it may be significantly faster to asynchronously produce results and export them to simple object stores such as Amazon S3.
Furthemore, many tasks that execute are likely mainly database intensive and not cpu intensive. Requiring EC2 nodes or other services that write to a disk is likely an expensive solution when results can easily be unloaded from Databases into object stores in an async manner.
Proposals:
Define interfaces for import of files from S3 buckets/google cloudstore/
Support a load table solution where results can be imported into load tables in databases in a threadsafe manner:
Upload csv objects then copy them to main table one at a time so any race conditions don't lock up tables
Support creating manifests that can be transfered. E.g. results are generated by some analytics package and a json file is created listing the bucket/object store and file reference as well as the result model spec
Support a simple table back end (in lue of a message queue/broker) that stores and logs the state of the results insert
Make a simple Plumber API that lets you initiate an upload from a given manifest (hashed entries to prevent multiple requests with identical uploads)
Cleanup/Garbage collection step: Delete objects from object stores when inserts are successful
Currently, this package only directly supports uploads of files from a directory structure. However, this is limiting for many projects because it may be significantly faster to asynchronously produce results and export them to simple object stores such as Amazon S3.
Furthemore, many tasks that execute are likely mainly database intensive and not cpu intensive. Requiring EC2 nodes or other services that write to a disk is likely an expensive solution when results can easily be unloaded from Databases into object stores in an async manner.
Proposals:
Potential Issues: