StatCan / openmpp

Implementing the OpenM++ microsimulation framework as a Kubernetes service on the StatCan cloud.
0 stars 1 forks source link

Architecture and integration design for OpenM++ on dev cluster #1

Closed chuckbelisle closed 1 year ago

chuckbelisle commented 1 year ago

A system that provisions the OpenM++ framework into some type of cloud-based deployment, either via VM or containerized.

chuckbelisle commented 1 year ago

@jacek-dudek, please update this ticket with any research or elaboration that was made.

jacek-dudek commented 1 year ago

Carried out some learning activities on Terraform, Docker, Kubernetes, OpenM++ web service.

Created a Dockerfile for a basic containerized deployment of OpenM++ and for running its web service on start up.

Uploaded Dockerfile to Docker Hub registry.

Created a basic Kubernetes cluster deployment on Azure using Terraform.

Created manifest files for the OpenM++ container and a load balancer to publish the application.

Confirmed that the basic setup runs successfully.

chuckbelisle commented 1 year ago

Next steps

jacek-dudek commented 1 year ago

Clarifying project direction and deliverables: We decided to work towards implementing a cloud offering that has feature parity with the existing microsimulation web service operated by the OpenM++ team on GCP.

Progress made: Did some more background reading of Kubernetes documentation. Identified Kubernetes objects that will be needed in subsequent iterations of the service. Located a github project that appears to be an implementation of OpenMPI on Kubernetes. URL for project: https://github.com/everpeace/kube-openmpi

Souheil-Yazji commented 1 year ago

Iteration 0 Scope Definition

This should enable us to host the OM++ web service on aaw-dev as a starting iteration.

Iteration 0.1 Scope Definition

To be further elaborated over the duration of Iteration 0

vexingly commented 1 year ago

You can find my notes on Kubeflow's integrated MPI training operator that I used for my POC here: https://github.com/StatCan/aaw-private/issues/95, the everpeace/kube-openmpi project was evaluated but it was created 5 years ago and has not been maintained vs the kubeflow training operators which are in active development.

Regarding provisioning a separate node pool, are there any project requirements that would need this yet? With the MPI training operator it would be as simple as labeling the manifest with the node type to use, but I think this should come as a special request from specific projects only after this have hit a limitation with our existing nodes.

chuckbelisle commented 1 year ago

Continuing this work in https://github.com/StatCan/openmpp/issues/3