StatCan / openmpp

Implementing the OpenM++ microsimulation framework as a Kubernetes service on the StatCan cloud.
0 stars 1 forks source link

[Epic] Implement OpenM++ MPI job controller using GO #24

Open Souheil-Yazji opened 11 months ago

Souheil-Yazji commented 11 months ago

Continuation of https://github.com/StatCan/openmpp/issues/19 Relates to https://github.com/StatCan/openmpp/issues/13

Requirements

- [ ] routing all `POST` MPIJob requests to this controller - [ ] controller successfully creates MPIjob manifests and submits them to k8s - [ ] a response code is propagated up to the requester ## Implementation - [x] #45 - [x] #41 - [ ] #42 - [ ] #43 - [ ] #44 - [ ] ## Testing ## Deployment ## Other Notes
chuckbelisle commented 11 months ago

Was not worked on during this sprint, moving to the next one.

jacek-dudek commented 10 months ago

I did background reading on golang foundations including core datatypes, control flow, concurrency model, organization of source files, importing packages, building projects.

I looked into the conversion process from json files into go datatypes and the reverse.

The parameter passing model was confusing so I needed to spend additional time on it. Parameters are passed by value. Passing by reference is done explicitly with pointers. BUT there are some core datatypes (slices, maps) that have implicit underlying data structures that store map and slice data. When a slice or map variable is passed as an argument, the "wrapper" part gets copied by value, but the underlying data is NOT copied. So maps and slices are essentially passed by reference.

I will need to review the material on concurrency and data sharing between goroutines via channels and the patterns and anti-patterns that are suggested there.

After that I started looking into the go-client library. I learned how to authenticate a go application from inside the cluster. There is a package named "k8s.io/client-go/rest" that creates the configuration based on the default authentication tokens that are copied into all deployed containers.

After creating the config object we can use it to query the kubernetes client set for all the resources in the cluster. Then we query the client set to obtain specific resource collections.

What I'm trying to figure out now is how the collection of rest endpoints served by the kubernetes api is mapped to go-client code. There is a core package named "k8s.io/client-go/kubernetes". This one exports a function that returns the client set for the cluster.

There are also packages like: "k8s.io/api/apps/v1" and "k8s.io/api/core/v1" that refer to specific resource groups and export data structures that correspond to the structure of the specification manifests for these resources. These ones we need to import to create resource specifications programmatically.

But some resource types don't offer a corresponding golang package (that would come with these resource specific data structure definitions). In that case we need to use a dynamic go-client package called "k8s.io/client-go/dynamic" that would let us generically define and interact with specific resource types like an mpijob.

So one thing I need to do is locate the mpijob specific golang package if it is provided by the kubeflow training and mpi operator extension. If it doesn't exist I need to look into the dynamic resource client.

jacek-dudek commented 10 months ago

Linking to branches with work in progress: https://github.com/StatCan/aaw-kubeflow-containers/pull/565 https://github.com/StatCan/openmpp/tree/openmpp-24

chuckbelisle commented 10 months ago

Updated this issue to be an Epic and edited description in order to define tasks.