ODM2 / ODM2DataSharingPortal

A Python-Django web application enabling users to upload, share, and display data from their environmental monitoring sites via the app's ODM2 database. Data can either be automatically streamed from Internet of Things (IoT) devices, manually uploaded via CSV files, or manually entered into forms.
BSD 3-Clause "New" or "Revised" License
31 stars 8 forks source link

Batch upload & other performance enhancements #674

Open aufdenkampe opened 1 year ago

aufdenkampe commented 1 year ago

We'll consider this step 2 of 2(ish) in addressing the larger issue (and opportunity) raised by @tpwrules with:

This PR pulls the code from https://github.com/tpwrules/ODM2DataSharingPortal/tree/wip/batch that is successfully running a private instance of Monitor My Watershed for the Univ. of Memphis.

We should probably merge this PR only after merging this smaller PR that contains the first 4 commits:

@ptomasula, @tpwrules, and @SRGDamia1, as a followup to our call today, I decided to create this PR to the new tpw_batch feature branch that I just created from develop.

This gives @ptomasula the ability to resolve the merge conflicts, complete the merge, then test it using CDK, without interfering with our current branches.

If merging and testing of the tpw_batch goes perfectly, we can then merge into develop.

If there are challenges with merging or testing, then perhaps we cherry-pick some of the easier commits first.

tpwrules commented 1 year ago

Thanks again for the discussion and allocating time for this.

Please note that there are several commits in here that are for my working process and should be dropped before merging. This branch was not intended to be merged directly. If I can get the site working locally with the Cognito integration then I can rebase but otherwise it is probably best to cherry-pick or have @ptomasula implement the changes in spirit rather than merge directly.

ptomasula commented 5 months ago

@aufdenkampe , @ScottEnsign @SRGDamia1

As @tpwrules noted, the tpwrules:wip/batch branch (current set for this PR) is a work in progress branch and not intended to merge in directly. I followed Thomas's recommendation and cherry-picked out select commits (see table below) into a new batch_upload. I'll be updating this PR to use this new batch_upload branch and work on testing and closing things out on there.

Commit Summary Commit Id Cherry-pick Notes
ODM2DSP: add nix specific django settings 919bb48 N These look to be configuration edits
avoid accessing ORM during import 267c3a3 N Already resolved in PR #637
turn sql model cache creation failure from error to warning b16224b Y N Already resolved in PR #637
commands/update_controlled_vocabularies: fix py3 compatibility 4b14718 Y N Already resolved in PR #637
commands/update_controlled_vocabularies: reject duplicates 1fe646a Y N Already resolved in PR #637
dataloaderservices: batch insert data from uploaded CSVs 0f80fe5 Y
remove google analytics 832cc38 Y Should address issue #700
git subrepo pull ODM2DataSharingPortal ed59e23 N Should already be included in cherrypick branch because this is a pull from upstream (this repo).
correct schema search path to use public for django stuff by default c8300c9 N setting.nix. not cherry picked. Can make this change to base.py if needed.
set django's timezone to UTC by default-ish so that data reception wo… cdfeba9 Y N excluding from cherry-pick for now and will further consider if necessary
use upsert instead of mysterious trigger function to update latest se… 8f950bc Y
replace thread pool with bundling everything in a single transaction aecd2c7 Y
speed up dataloader table sync by ditching pandas 6dc8eea Y
ditch pandas from uuid lookup 4365f6a Y
avoid extra step to retrieve site sensor ID 484ca0f Y
handle all data queries as batches 434ab81 Y
slightly optimize authentication e90b108 Y
add batch upload support a1e3eaf Y
optimize batch insertion 87a6c53 Y
fix file upload 1002c8d Y
reduce memory usage and fix issues with empty data during file upload 5a24db1 Y
double file upload speed by copying into temporary table fca8b31 Y
tpwrules commented 5 months ago

Excited to see the progress!

Some notes, though I haven't looked at all this in a long time:

ptomasula commented 5 months ago

Thanks @tpwrules! Those are helpful insights. I have updated the cherry-pick list based on your input.