StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
68 stars 12 forks source link
aaw das docs statcan

Data Analytics as a Service

Data Analytics as a Service for the Government of Canada and external collaborators.

Frequently Asked Questions (FAQ)

If your question does not appear in this document, please reach out to us on our Slack Support Channel.

Who can access the AAW?

What data formats are supported in the AAW?

The AAW includes tools that allow data science users to open almost any file. The AAW supports many commonly used file formats, including (but not limited to):

How much does the AAW cost?

CPU Only

Use Case Compute Resources Time (Hours/Week) Cost
CPU RAM (GB) GPU Weekly Monthly Annually
CPU: Occasional Use 2 8 0 8 1.1367 4.88781 59.1084
CPU: During Business Hours 2 8 0 40 5.6835 24.43905 295.542
CPU: 24/7 2 8 0 168 23.8707 102.64401 1241.2764

Add a GPU

Use Case Compute Resources Time (Hours/Week) Cost
CPU RAM (GB) GPU Weekly Monthly Annually
GPU: Occaisonal Use 0 0 1 8 34.468 148.2124 1792.336
GPU: During Business Hours 0 0 1 40 172.34 741.062 8961.68
GPU: 24/7 0 0 1 168 723.828 3112.4604 37639.056

What are the steps for getting Protected B data into MinIO?

Can we use Power BI on the AAW?

Does using SAS entail different costs than the others? Are there a limited number of licenses or instances that can be run?

How do you suspend your server (to save costs)?

How do I add other people to my namespace (for collaboration)?

Are there any pre-loaded data (datasets) in AAW that we can access and use for both R and Python notebooks?

Presentations

We highly encourage you to watch our YouTube presentation given at Stratosphere:

Security

A discussion about some of the security best practices in use by this platform:

Advanced Analytics Workspace

The following is a list of all the general related repositories for the Advanced Analytics Workspace project.

Repository Description Visibility
aaw-argocd-applications ArgoCD Applications Private
aaw-argocd-manifests Manifests used for ArgoCD deployments Public
aaw-argoflow-azure Kubeflow deployment powered by ArgoCD Public
aaw-kubeflow-containers Containers to be used within Kubeflow Public
aaw-contrib-containers Containers to be used for general purpose Data Science Public
aaw-contrib-jupyter-notebooks Jupyter Notebooks to be used with the Advanced Analytics Workspace platform Public
aaw-contrib-r-notebooks R Notebooks to be used with Advanced Analytics Workspace platform Public
aaw-gatekeeper-constraints Gatekeeper constraints built specifically for AAW Private
aaw-goofys-injector Mount an S3 bucket, Data Lake, Blob Storage as a file system in a Notebook Public
aaw-inferenceservices-controller Kubernetes controller for managing inference services Public
aaw-kubeflow-manifests Kustomize installation manifests for Kubeflow Public
aaw-kubeflow-controller Kubeflow controller which sets PodDefaults + Vault policies for each Profile detected Public
aaw-kubeflow-mlops Kubeflow MLOps pipeline using GitHub Actions Public
aaw-kubeflow-opa-sync Synchronize profile editors into the Open Policy Agent for use in MinIO Access Control Public
aaw-kubeflow-pipelines-secret-scanner Scan all Kubeflow pipelines for exposed secrets Public
aaw-kubeflow-profiles Kubeflow profile manifests stored in YAML Private
aaw-kubeflow-profiles-controller Kubeflow profiles controller which allows for custom configuration for an individual profile Public
aaw-minio-credential-injector Mutating webhook which adds minio credential annotations to notebook pods Public
aaw-network-policies Kubernetes network policies for AAW Private
aaw-prob-notebook-controller Kubernetes controller for managing Authorization Policies associated to Protected-B Notebooks Public
aaw-security-proposal Proposal for the implementation of Protected B workloads in AAW Public
aaw-toleration-injector Kubernetes toleration injector with support for GPUs and Node Pools Public

Terraform

The following is a list of all the terraform related repositories for the Advanced Analytics Workspace project.

Install the AAW Platform and Infrastructure

## Installs AAW Platform and Infrastructure
##
## └─── https://github.com/statcan/terraform-advanced-analytics-workspaces-infrastructure
##      ├─── https://github.com/statcan/aaw-dev-cc-00
##      ├─── https://github.com/statcan/aaw-prod-cc-00
##      │    ├── https://github.com/statcan/terraform-azure-statcan-aaw-environment
##      │    │   ├── https://github.com/statcan/terraform-statcan-aaw-network
##      │    │   └── https://github.com/statcan/terraform-azure-statcan-cloud-native-environment-infrastructure
##      │    │       ├── https://github.com/canada-ca-terraform-modules/terraform-azurerm-kubernetes-cluster
##      │    │       └── https://github.com/canada-ca-terraform-modules/terraform-azurerm-kubernetes-cluster-nodepool
##      │    └─── https://github.com/statcan/terraform-statcan-aaw-platform (see below)
##      └─── https://github.com/statcan/terraform-azure-statcan-aaw-region-environment
Component Repository Description
AAW terraform-advanced-analytics-workspaces-infrastructure Reference implementation for an Advanced Analytics Workspaces (AAW) infrastructure pipeline
AAW aaw-dev-cc-00 Reference implementation for an Advanced Analytics Workspaces (AAW) development environment
AAW aaw-prod-cc-00 Reference implementation for an Advanced Analytics Workspaces (AAW) production environment
AAW terraform-azure-statcan-aaw-environment Terraform module of Advanced Analytics Workspaces (AAW) per-environment Azure configuration
AAW terraform-azure-statcan-aaw-network Terraform module of Advanced Analytics Workspaces (AAW) networking
AAW terraform-azure-statcan-cloud-native-environment-infrastructure Terraform module for Statistics Canada's Cloud Native Environment Azure Cloud Infrastructure
AAW terraform-azurerm-kubernetes-cluster Terraform module for Azure Kubernetes Service (AKS) cluster
AAW terraform-azurerm-kubernetes-cluster-nodepool Terraform module for Azure Kubernetes Service (AKS) nodepool
AAW terraform-azure-statcan-aaw-region-environment Terraform module of Advanced Analytics Workspaces (AAW) per-region configuration of Azure
AAW terraform-statcan-aaw-platform Terraform module for the Advanced Analytics Workspaces (AAW) platform

Install the Cloud Native Platform

## Statistics Canada's Cloud Native Platform (CNP)
##
## └─── https://github.com/statcan/terraform-statcan-aaw-platform
##      ├─── https://github.com/statcan/terraform-azure-statcan-cloud-native-platform-infrastructure
##      │    ├─── aad_pod_identity
##      │    ├─── cert_manager
##      │    ├─── vault
##      │    └─── velero
##      ├─── https://github.com/statcan/terraform-statcan-kubernetes-core-platform
##      │    ├─── aad_pod_identity
##      │    ├─── cert_manager
##      │    ├─── fluentd
##      │    ├─── gatekeeper
##      │    ├─── kubecost
##      │    ├─── prometheus
##      │    ├─── vault_agent
##      │    └─── velero
##      ├─── https://github.com/statcan/terraform-statcan-kubernetes-app-platform
##      │    ├─── istio operator
##      │    └─── istio gateway handling
##      └─── https://github.com/statcan/terraform-kubernetes-namespace
##           └─── daaas-system
Component Repository Description
CNS terraform-azure-statcan-cloud-native-platform-infrastructure Terraform module for Statistics Canada Azure Cloud Native Platform Infrastructure
CNS terraform-statcan-kubernetes-core-platform Terraform module for Statistics Canada Core Kubernetes Platform
CNS terraform-statcan-kubernetes-app-platform Terraform module for Statistics Canada Kubernetes Application Platform

Misc

Repository Description Visibility
terraform-aaw-managed-databases Terraform module for deployment of Azure Managed Databases Private
terraform-aaw-vault Terraform module for configuring Hashicorp Vault Private

Community Engagement

The following is a list of some of the collaborative work we made available to improve upstream projects.

Repository Description Visibility
boathouse Manage Kubernetes storage mounts with Goofys Public
jupyter-apis Golang replacement for the Kubeflow Jupyter Web APIs Public
jupyterlab-language-pack-fr_FR JupyterLab fr-FR Language Pack Public
vault-plugin-secrets-minio Vault plugin which will provision multi-user keys for Minio Public

The following is a list of some of the forked projects where we have provided multilingual support and other UX related enhancements.

Repository Description Visibility
kubeflow Multilingual support for Kubeflow Public
kubeflow-pipelines Multilingual support for Kubeflow Pipelines Public
minio Multilingual support for MinIO Public
minio-console Multilingual support for MinIO Console Public
rstudio Multilingual support for RStudio Public

Developer Notes:

Fix spelling by executing fix-spelling-en and fix-spelling-fr Adding to the sensitive or insensitive category Ignoring will simply ignore the error for this round. It will trigger again next execution.