c-scale-community / workflow-coastal-hydrowaq

Porting and deploying the HiSea use case on C-SCALE
Apache License 2.0
3 stars 1 forks source link

Sprint 3: 22-26 November 2021 #16

Closed backeb closed 2 years ago

backeb commented 2 years ago

Sprint 3: 22-26 November 2021

Sprint activities

For additional information see notes from Sprint 2 retro: https://github.com/c-scale-community/use-case-hisea/issues/13#issuecomment-954780477

Background, high-level objectives and status

We want to compare performance and scalability of 4 different architecture options:

  1. Fully cloud based + boundary data is downloaded (this is the current workflow) Status: The workflow has been deployed and the steps (download data, preprocess data, launch model simulation, postprocess output) are triggered manually. To accurately test the performance and deploy an operational (automated) services requires that we remove manual steps, we do this using: a. A Managed Kubernetes cluster b. Argo workflow (https://argoproj.github.io/) to be installed on the cluster
  1. Fully cloud based + boundary data is accessed through provider's datastore (easy access to data) (option to avoid downloading data) Status: The scripts to download the requisite data have been prepared and containerised. High level discussions are ongoing at GRNET regarding the provisioning of an NFS server.

  2. Pre- and post-processing in the cloud + model running on HPC Status: We have access to GRNETs HPC environment and in Sprint 3 will deploy the Singularity container. We will need to progress on the managed Kubernetes cluster set up and Argo Workflow installation to achieve this test.

  3. Fully HPC based + boundary data downloaded / accessed Status: We have access to GRNETs HPC environment and in Sprint 3 will deploy the Singularity container. We will need to adjust the download, pre and post processing scripts to work operationally / automatically in the HPC environment.

AOB

backeb commented 2 years ago

Regarding the action:

See below feedback from Deltares:

Hi Bjorn,

Yes, this is fine by me. The kernel release is in ~2 weeks. Anna knows where to find the image 😊 We will be happy to support them get things up and running because it gets us useful feedback.

Anna van Gils I assume you are the technical point of contact for them? Feel free to get in touch with Adri when you have technical questions setting it up. If you/they get blocked, just set up a call with us (please include me as optional. If my agenda is limiting ignore it please).

As a system requirement: Can you ask them to ensure Intel MPI is installed on their cluster. We are using version 2021.2.0.216 ourselves but a version close to that should be fine.

Akhil Piplani Project Manager | Deltares

@avgils we can go ahead and share the Singularity image with GRNET

cc @yan0s @nikosT @sebastian-luna-valero @lorincmeszaros

kkoumantaros commented 2 years ago

The nfs-server is setup with 300TB as a starting volume you can mount it using mount 192.168.0.85:/nfs-volume similarly you can add the following line in fstab to mount in on boot by adding the following line in the end 192.168.0.85:/nfs-volume nfs4 defaults 0 0

backeb commented 2 years ago

Regarding the action

I was advised that we (the users, i.e. @avgils) can use the PaaS Orchestrator (https://indigo-paas.cloud.ba.infn.it/home/login) to easily deploy a Kubernetes cluster on GRNET. However, before we can use the PaaS Orchestrator, the HiSea VO needs to be configured for it, which is an action for INFN (i.e. @gdonvito and his team).

@gdonvito could you perhaps progress on configuring the HiSea VO for the PaaS Orchestrator? Note that the VO will change to hisea.c-scale.eu

cc @avgils @lorincmeszaros @sebastian-luna-valero @enolfc @sustr4 @kkoumantaros

lorincmeszaros commented 2 years ago

Delft3D FM is now runnning in Singularity on the GRNET HPC. Further actions are:

  1. test another setup (MPI library installed on HPC)
  2. Add partitionning step to the container (at the moment partitioning has to be done on our side)
backeb commented 2 years ago

Retrospective

Tops

Tips

Deploy Delft3D FM Singularity container on GRNETs HPC

@backeb check if any agreements are required to share Singularity image with GRNET

@avgils share Singularity container image with @nikosT to test optimisation

Test performance of Singularity container using MPI library inside container vs using library on HPC (@avgils @lorincmeszaros)

See also @lorincmeszaros compare container performance between different set ups (e.g. using MPI library in Singularity container vs using MPI library installed on HPC) #14

Progress towards setting up a Kubernetes cluster to configure and automate the workflow (@sebastian-luna-valero @kkoumantaros)

@sebastian-luna-valero find support to help set up Kubernetes cluster on GRNET OpenStack

@avgils @lorincmeszaros provide compute and storage requirements for Kubernetes cluster to @sebastian-luna-valero #22

@avgils get familiar with PaaS Orchestrator (https://indigo-paas.cloud.ba.infn.it/home/login). @gdonvito please support. #21

Progress on setting up NFS server (network storage) and adjust data downloads scripts to download to NFS server (@yan0s @nikosT @avgils @lorincmeszaros)

Find secure place to store .cdsapirc file for the C-SCALE account on the Climate Data Store (@sebastian-luna-valero)

Objectives for next sprint

Dates for next sprint

10-14 Jan 2022

sebastian-luna-valero commented 2 years ago

Thanks for the great summary.

Please add me to the conversations about the INDIGO PaaS Orchestrator.