RENCI / ctmd

MIT License
2 stars 0 forks source link

Make incremental update so that processing time is reduced #226

Open xu-hao opened 4 years ago

xu-hao commented 4 years ago

Does API have capability to do a time-based query? If yes, then we just get a diff from the last time we queried the database. Then do increment update from there If not, would need to do a diff , but that would still be faster than what we’re doing now. Json diff - if a field changes, we’d need to parse out the proposal ID

Faster VM with more cores and more memory would also make it faster. Deliverable: after we move to cloudapps to benchmark and make the process faster by using more compute and the process itself is for updating the database from REDCap

xu-hao commented 3 years ago

Currently, every time sync is run, the pipeline deletes the tables mapped from redcap data and recreate them with new data. A more efficient way is to calculate the diff between the new export and the current export and only insert/delete/update rows that are updated on the new export. this requires the pipeline to do a diff and algorithmically generate insert/delete and update statements.