Open aufdenkampe opened 1 year ago
Thanks again for the discussion and allocating time for this.
Please note that there are several commits in here that are for my working process and should be dropped before merging. This branch was not intended to be merged directly. If I can get the site working locally with the Cognito integration then I can rebase but otherwise it is probably best to cherry-pick or have @ptomasula implement the changes in spirit rather than merge directly.
@aufdenkampe , @ScottEnsign @SRGDamia1
As @tpwrules noted, the tpwrules:wip/batch branch (current set for this PR) is a work in progress branch and not intended to merge in directly. I followed Thomas's recommendation and cherry-picked out select commits (see table below) into a new batch_upload. I'll be updating this PR to use this new batch_upload branch and work on testing and closing things out on there.
Commit Summary | Commit Id | Cherry-pick | Notes |
---|---|---|---|
ODM2DSP: add nix specific django settings | 919bb48 | N | These look to be configuration edits |
avoid accessing ORM during import | 267c3a3 | N | Already resolved in PR #637 |
turn sql model cache creation failure from error to warning | b16224b | Already resolved in PR #637 | |
commands/update_controlled_vocabularies: fix py3 compatibility | 4b14718 | Already resolved in PR #637 | |
commands/update_controlled_vocabularies: reject duplicates | 1fe646a | Already resolved in PR #637 | |
dataloaderservices: batch insert data from uploaded CSVs | 0f80fe5 | Y | |
remove google analytics | 832cc38 | Y | Should address issue #700 |
git subrepo pull ODM2DataSharingPortal | ed59e23 | N | Should already be included in cherrypick branch because this is a pull from upstream (this repo). |
correct schema search path to use public for django stuff by default | c8300c9 | N | setting.nix. not cherry picked. Can make this change to base.py if needed. |
set django's timezone to UTC by default-ish so that data reception wo… | cdfeba9 | excluding from cherry-pick for now and will further consider if necessary | |
use upsert instead of mysterious trigger function to update latest se… | 8f950bc | Y | |
replace thread pool with bundling everything in a single transaction | aecd2c7 | Y | |
speed up dataloader table sync by ditching pandas | 6dc8eea | Y | |
ditch pandas from uuid lookup | 4365f6a | Y | |
avoid extra step to retrieve site sensor ID | 484ca0f | Y | |
handle all data queries as batches | 434ab81 | Y | |
slightly optimize authentication | e90b108 | Y | |
add batch upload support | a1e3eaf | Y | |
optimize batch insertion | 87a6c53 | Y | |
fix file upload | 1002c8d | Y | |
reduce memory usage and fix issues with empty data during file upload | 5a24db1 | Y | |
double file upload speed by copying into temporary table | fca8b31 | Y |
Excited to see the progress!
Some notes, though I haven't looked at all this in a long time:
Thanks @tpwrules! Those are helpful insights. I have updated the cherry-pick list based on your input.
We'll consider this step 2 of 2(ish) in addressing the larger issue (and opportunity) raised by @tpwrules with:
649
This PR pulls the code from https://github.com/tpwrules/ODM2DataSharingPortal/tree/wip/batch that is successfully running a private instance of Monitor My Watershed for the Univ. of Memphis.
We should probably merge this PR only after merging this smaller PR that contains the first 4 commits:
637
@ptomasula, @tpwrules, and @SRGDamia1, as a followup to our call today, I decided to create this PR to the new
tpw_batch
feature branch that I just created fromdevelop
.This gives @ptomasula the ability to resolve the merge conflicts, complete the merge, then test it using CDK, without interfering with our current branches.
If merging and testing of the
tpw_batch
goes perfectly, we can then merge intodevelop
.If there are challenges with merging or testing, then perhaps we cherry-pick some of the easier commits first.