Admin portal for uploads and dataset management

joeflack4 commented 6 years ago

Description

The "admin portal" is a location where PMA2020 data managers can upload new datasets, and manage old ones. The datasets they upload will be of the kind of api-data*.xlsx file that are currently stored in the data directory.

Tasks

1. Create route `/admin`

1a. Create routes

The route /v1/admin should re-route to /admin.

2b. Create deployment-specific logic

We are considering adding the following logic: If environment is production, re-routes them to: http://api-staging.pma2020.org/admin

However, upon further consideration, I think we will make it explicit within the UI that certain actions will affect staging or will affect production. So, given that distinction, it may not be necessary to re-route, as the admin portal feature will know the difference between the staging and production deployments.

As far as setting the URLs which point to the API endpoints, or database information such as DB URL, DB name, and username/password, these should also be set inside environmental variables.

2. Single page UI scaffolding for all features #ui

This can be done up-front before, after, or simultaneously while implementing the underlying logic. Not sure yet who this will be done by (1 or more of Joe / Richard / Bciar).

20180802_151359

URL
Title
Layout: Instructions
Layout: Upload new dataset, w/ upload button
Status / notification bar: Appears only when uploading or applying a dataset
- OnUpload: "Uploading dataset..."
- OnSuccess: Prompts them to do testing before uploading.
- OnFail: Rollback to previous version: Gets most recent passing dataset from S3; Re-uploads that manually
Uploaded datasets list / table (stores previously uploaded datasets; contains data about upload history)
- Text: describing the various aspects of the dataset
Dataset selection buttons: After selecting a dataset, have the following options as buttons. a. Download b. Button: Apply dataset to staging: (uploads to staging database; uses command: manage.py initdb --overwrite ; run automated tests & automated failsafe rollback. onClick: Does alert: "Are you sure?") c. Button: Apply dataset to production: (onClick: Does alert: "Are you sure?". same as staging, but for prod? or perhaps can use PG copy to move the dataset that has been applied from staging to production, if the dataset selected is indeed already in the staging database. Also, James wanted to see if there was a way better than PG copy.)

3. Create S3 bucket to store datasets

Create a new bucket to store all datasets. It may not be necessary to create a "staging" bucket deployment for this. If we do that, however, I would consider that more of a backup location than anything otherwise.

4. Create new `datasets` DB table

Fields to create

url: string / varchar ~256
datasetDisplayName: string / varchar ~256
uploadDate: dateTime?, string / varchar ~256?
versionNumber: string / varchar ~256, semvar? int?
datasetType: string / varchar; factor var of ('data', 'metadata', 'dataAndMetadata')
datasetSubType: string / varchar ~256?; domain of, ('CCRR', 'METADATA_CLASS', 'all'/'full')
isActiveStaging: boolean
isActiveProduction: boolean

More about datasetSubType This is where CCRR is for country/round, e.g. "GHR6", and METADATA_CLASS is for a single class of structural metadata, e.g. 'indicators', 'countries', etc. The type "all" is for instances where either all data for all country/rounds is being uploaded, all meta data classes are being uploaded, or all of the data and metadata is being uploaded at once.

About non-idempotent functions on DB fields #db

Updates / deletes will not be implemented for these fields. Just create/reads. Exception is "isActive."

url: Create - on upload; gets from location in S3 bucket
datasetDisplayName: Create - on upload; gets from datasetDisplayName of settings worksheet in the dataset excel file, else uses the file name, minus extension.
uploadDate: Create - on upload; Probably should get from Python datetime.datetime.now() (easier), or S3 API
versionNumber: Create - on upload. Checks the database for the current version of the given datasetSubType, and increments by 1.
datasetType: Create - on upload; pma-api this information by identifying worksheets that are in the excel file uploaded and making a classification based on what it finds.. If there are only data_* worksheets, type is "data". If there are only known metadata named worksheets, type is "metadata". If both, type is "dataAndMetadata". Ignores unrecognized worksheet names "changelog", "info", and "settings" during this classification. In MVP version, only dataAndMetadata will be valid.
datasetSubType: Should be implemented in similar fashion as datasetType.
isActiveStaging: Create - false by default. Update - Activated when the "apply to staging" button is selected.
isActiveProduction: Create - false by default. Update - Activated when the "apply to production" button is selected.

5. Implement logic for upload dataset feature

a. Create UI: See: #ui
b. Add DB manipulation logic: See: #db
c. Finish tying logic to the UI as necessary: Some/all of this may be covered in step (b).
d. Other modifications to manage.py initdb script: It needs to not drop all tables when --overwrite is passed, or ever. Instead it should drop only named tables. For now, those named tables will be everything except for the new datasets table.

5.5. Change initdb script to drop everything but the datasets table

6. Implement logic for dataset list feature

a. Create UI: See: #ui
b. Add DB manipulation logic: See: #db
c. Finish tying logic to the UI as necessary: Some/all of this may be covered in step (b).

7. Implement logic for apply dataset to staging feature

a. Create UI: See: #ui
b. Add DB manipulation logic: See: #db
c. Finish tying logic to the UI as necessary: Some/all of this may be covered in step (b).

8. Implement logic for apply dataset to production feature

a. Create UI: See: #ui
b. Add DB manipulation logic: See: #db
c. Finish tying logic to the UI as necessary: Some/all of this may be covered in step (b).

Task List

[x] 1. Create route /admin
[x] 2. Single page UI scaffolding for all features #ui
[x] 3. Create S3 bucket to store datasets
[x] 4. Create new datasets DB table #db
[x] 5. Implement logic for upload dataset feature
[x] 5.5. Change initdb script to drop everything but the datasets table
[x] 6. Implement logic for dataset list feature
[ ] 7. Implement logic for apply dataset to staging feature
[ ] 8. Implement logic for apply dataset to production feature

joeflack4 commented 6 years ago

@richardnguyen5 We may do some pair programming on this together!

joeflack4 commented 6 years ago

@bciar Don't worry about this until getting back from your vacation. But I have some ideas for how you can contribute here.

joeflack4 commented 6 years ago

Edit: Removed "manual testing" feature from this issue, and put into its own GitHub issue #48.

joeflack4 commented 6 years ago

@richardnguyen5 Updated this per our discussion.

Split some items into new issues: #48, #52, #53, #54, #55

PMA-2020 / pma-api