anvilproject / client-apis

Clients for Python, R, javascript that interact with [terra, gen3, galaxy, others]
Apache License 2.0
9 stars 5 forks source link

Dashboard #28

Closed bwalsh closed 4 years ago

bwalsh commented 4 years ago

This PR formalizes the data dashboard.

Usage:


from anvil.terra.reconciler import Reconciler
import json

projects = []

reconciler = Reconciler('CCDG', 'terra-test-bwalsh', 'anvil-datastorage', 'AnVIL_CCDG.*')
projects.extend([v for v in reconciler.dashboard_views])

reconciler = Reconciler('CMG', 'terra-test-bwalsh', 'anvil-datastorage', 'AnVIL_CMG.*')
projects.extend([v for v in reconciler.dashboard_views])

reconciler = Reconciler('GTEx (v8)', 'terra-test-bwalsh', 'anvil-datastorage', '^AnVIL_GTEx_V8_hg38$')
projects.extend([v for v in reconciler.dashboard_views])

reconciler = Reconciler('ThousandGenomes', 'terra-test-bwalsh', 'anvil-datastorage', '^1000G-high-coverage-2019$')
projects.extend([v for v in reconciler.dashboard_views])

with open('/tmp/data_dashboard.json', 'w') as outs:
    json.dump({'projects': projects}, outs)

Tracks ETL/normalization issues in new problems node for each workspace:

    {
      "file_histogram": [
        {
          "count": 34764,
          "size": 100792349713038,
          "date": "2019-07-16"
        }
      ],
      "files": [
        {
          "count": 17382,
          "size": 100725093119470,
          "type": "Bam"
        },
        {
          "count": 17382,
          "size": 67256593568,
          "type": "Bai"
        }
      ],
      "nodes": [
        {
          "type": "Project",
          "count": 1
        },
        {
          "type": "Subject",
          "count": 979
        },
        {
          "type": "Samples",
          "count": 17382
        }
      ],
      "size": 100792349713038,
      "project_id": "AnVIL_GTEx_V8_hg38",
      "public": false,
      "createdDate": "2019-05-23T17:17:00.849Z",
      "lastModified": "2020-08-20T14:44:18.470Z",
      "data_type": null,
      "data_category": null,
      "problems": {
        "inconsistent_entityName": false,
        "inconsistent_subject": false,
        "missing_blobs": false,
        "missing_samples": true,
        "missing_project_files": false,
        "missing_subjects": false,
        "missing_schema": false,
        "missing_sequence": false
      },
      "source": "GTEx (v8)"
    },

TODO:

[x] dbGAP [x] notebook [x] integration tests with service account [ ] eMERGE (pending workspace creation)

bwalsh commented 4 years ago

@NoopDog - still a work in progress, but hoping you might take a look a this output snapshot.

data_dashboard.json.txt

bwalsh commented 4 years ago

Read the docs

RTD updated https://pyanvil.readthedocs.io/en/latest/

bwalsh commented 4 years ago

Dashboard and associated bucket

https://anvil.terra.bio/#workspaces/terra-test-bwalsh/pyAnVIL%20Notebook/notebooks https://console.cloud.google.com/storage/browser/fc-secure-d8ae6fb6-76be-43a4-87a5-2ab255fc8d7d

You should be able to browse / run the 0.0.2 notebook.

bwalsh commented 4 years ago

9/14 reviewed w/ arula@broadinstitute.org ; candace@broadinstitute.org ; Dave@clevercanary.com