c-scale-community / workflow-coastal-hydrowaq

Porting and deploying the HiSea use case on C-SCALE
Apache License 2.0
3 stars 1 forks source link

Resources for training #33

Closed backeb closed 1 year ago

backeb commented 1 year ago

Hi @kkoumantaros @MZICloudferro @LukaszKubowicz cc @sebastian-luna-valero @cchatzikyriakou @lorincmeszaros

At the EGI conference we are planning a demo and training of the hisea use case. In preparation for this demo / training we are organising an internal training at Deltares on 15 Sep from 15h00 - 19h00.

@kkoumantaros @MZICloudferro @LukaszKubowicz would it be possible to set up VMs for our participants to use during this training?

Thanks Björn

MZICloudferro commented 1 year ago

Hi @backeb ,

How many resources do you need? (more or less)?

Best Marcin

backeb commented 1 year ago

Hi @MZICloudferro

Thanks for your response. I think there might be 10-20 participants of the course... but not 100% sure. Am sending out an email this week to confirm.

How much lead time do you need to prepare the resources? Then I know what registration deadline to set.

Cheers Björn

sebastian-luna-valero commented 1 year ago

Hi @backeb

I think we would also need to know if each trainee will require a smaller, dedicated VM or whether they are happy to share the same/bigger ones.

Also, one aspect is providing the quotas and creating the initial VMs, but then the other important aspect is to deploy the required environments on them for the training. Will you need help with that?

Best regards, Sebastian

backeb commented 1 year ago

Hi @sebastian-luna-valero

Thanks for your response!

I think we would also need to know if each trainee will require a smaller, dedicated VM or whether they are happy to share the same/bigger ones.

What would sharing a same/bigger VM look like? Will each user work in their own folder, or will users have to sit together in groups and let one of the users drive?

Also, one aspect is providing the quotas and creating the initial VMs, but then the other important aspect is to deploy the required environments on them for the training. Will you need help with that?

My thinking is to just let the users clone the repo to the VM and then get them to build and run the containers following the instructions, e.g. https://github.com/c-scale-community/use-case-hisea/tree/main/scripts.

backeb commented 1 year ago

@kkoumantaros @MZICloudferro @sebastian-luna-valero is it possible to have singularity installed on the VMs we use? We prefer to run our models through singularity containers.

We have been using docker for the downloading and preprocessing. And if singularty not avail for the model runs can also use docker, but prefer singulariy

sebastian-luna-valero commented 1 year ago

Hi @backeb

In principle, having your own VM(s) will allow you to install either docker or singularity.

On the other hand, I had a closer look to the code and here are a few suggestions:

And some questions:

Best regards, Sebastian

backeb commented 1 year ago

Thanks @sebastian-luna-valero

As a user I don't want to install either docker or singularity - I want it to be available for me to use.

Avoid using hard-coded, absolte paths in the scripts to improve portability. For example, this will only work on CentOS but not Ubuntu.

The hard-coded script is just an example of how to run the docker containers. I made a note that users should change the input parameters if they want to use.

It looks like containers are just building conda environments with Python dependencies. Using conda environments directly would remove complexity in the code, and reduce time to recreate the environment for others.

Indeed. Perhaps worthwhile to sit down and discuss how to improve the set up.

My thinking was to keep all the components separate and in separate containers, so that users can pick and choose and use only specific components of the workflow for their own applications.

In my view the workflow solutions are templates that help kick-start a use case.

What's the amount of input/intermediary/output data to work with per workflow run?

Just the download and preprocessing for a 2 day run 4.1G, but most of that data is the global FES2012 data, which is relatively static - i.e. only updated every couple of years or so. FES2014 is available, and FES2022 is coming out soon.

image

image

This is excluding the output from a model run and the postprocessing. Am working on running the model.

Will the workflow consume a lot of CPU and/or RAM?

Not sure, but I think its more CPU

How many trainees are expected to join the event?

I expect between 10 and 20, will confirm.

sebastian-luna-valero commented 1 year ago

If you want to test conda environments, we can continue the dicussion on https://github.com/c-scale-community/use-case-hisea/pull/34

I am happy to profile the workflow once is ready to better decide how to arrange the computing resources. Maybe, we just need a single big VM where we download input data and provide the conda installation once, and then everybody runs their workflow on their personal area.

backeb commented 1 year ago

Ok thanks re the link to the conda discussion.

I don't want to change my workflow before the training on 15 Sept.

Each participant of the training should

  1. Clone the repo
  2. build the docker containers
  3. run each docker container of the workflow, i.e.
    • docker run download-input
    • docker run preprocessing
    • docker run model
    • docker run postprocessing
    • docker run visualise

I'd like them to do this so they can get a feeling for how the components of the workflow work. The next ambition is then to string all of these things into a workflow orchestrator e.g. snakemake.

Does that make sense? Maybe we should have a call...

sebastian-luna-valero commented 1 year ago

It makes sense, thanks.

Once the workflow is polished, please let me know the estimates for CPU and storage per trainee and I will help commissioning the compute resources in the cloud. Happy to be the guinea pig to profile the workshop as a trainee (and troubleshoot problems). Does it sound like a good plan? Or do you still prefer to have a meeting?

backeb commented 1 year ago

Sounds like a plan!

backeb commented 1 year ago

Next week Thursday and Friday @lorincmeszaros and I will spend the day to try and finalise the last parts, i.e.

  1. run the model
  2. postprocess
  3. visualise

It should then be ready for you to test.

The download and preprocess components already work.

backeb commented 1 year ago

@kkoumantaros @MZICloudferro @sebastian-luna-valero is it possible to have singularity installed on the VMs we use? We prefer to run our models through singularity containers.

We have been using docker for the downloading and preprocessing. And if singularty not avail for the model runs can also use docker, but prefer singulariy

@kkoumantaros @MZICloudferro please let me know if singularity is available on your cloud compute.

backeb commented 1 year ago

How many trainees are expected to join the event?

I expect between 10 and 20, will confirm.

Registration deadline for our internal training is 2 September, I will confirm numbers then.

LukaszKubowicz commented 1 year ago

Hi, I came back from vacation, and I am ready to take over this topic on the CloudFerro side. Bjorn, please let me know exactly what kind of resources you need, and I will talk to CF Cloud team.

Łukasz Kubowicz

backeb commented 1 year ago

Hi @LukaszKubowicz

Thanks for your response.

The VM I'm working on to develop this at GRNET has the following specs:

VCPUs Disk RAM
16 40GB 32GB

And the VM has docker installed.

The rest can be done in the user space.

With @sebastian-luna-valero we are thinking about one big VM with many users vs many smaller VMs, with one VM per user. I want each participant of the training to do the following:

  1. Clone the repo
  2. build the docker containers
  3. run each docker container of the workflow, i.e. a. docker run download-input b. docker run preprocessing c. docker run model d. docker run postprocessing e. docker run visualise

What do you think, 1 big VM with many users? Or one smaller VM per users?

I will confirm exact number of participants after 2 Sept.

LukaszKubowicz commented 1 year ago

Hi @backeb, I will speak to our cloud team and will let You know.

LukaszKubowicz commented 1 year ago

@backeb @sebastian-luna-valero , I'm waiting for the cloud team to response. But for now on, as long as You have access/ credentials and Project, can You create these VMs on Your own, and manage them ? It might be the most suitable option. Let me know.

MicrosoftTeams-image MicrosoftTeams-image (1)

Please check properties of these: eo2.3xlarge eo2a.3xlarge

backeb commented 1 year ago

Hi @kkoumantaros and @LukaszKubowicz cc @sebastian-luna-valero, @cchatzikyriakou

After a quick discussion with @sebastian-luna-valero last week, we decided to first talk with @sustr4 about running this training course on CESNET resources. There is already a VO we can use for this purpose.

We want to do this because

  1. we want the learners to register with EGI Check-in (and CloudFerro is not yet integrated with Check-in)
  2. we the learners to access OpenStack (via Check-in) and deploy a VM

The learners will then

  1. Clone the HiSea use case repo
  2. Build the docker containers needed for the workflow (need cloud container compute)
  3. Run each docker container (each docker container is one step in the workflow)

I will keep this ticket open, but will include @sustr4 in this conversation going forward.

Cheers Björn

sustr4 commented 1 year ago

Hi, yes, we are certainly willing to host the training. I did not yet properly read through the discussion above, and it would help me greatly if I could get a summary of the required resources.

Point 2 in Bjorn's comment above is a little confusing to me, but if it means that users are supposed to instantiate VMs themselves, the I propose to use the VO eval.c-scale.eu to register users. I would make sure adequate resources are available to its members in our cloud.

Keep me in the loop. Zdeněk

backeb commented 1 year ago

Hi @sustr4 and @sebastian-luna-valero cc @cchatzikyriakou, @lorincmeszaros

As mentioned on 📅 15 September 2022 from 15h00-19h00 CET we are running an in-house training course at Deltares where we want to teach the learners to do the following

  1. Log on on to CESENT OpenStack environment via EGI Check-in
  2. Deploy a VM (I'm thinking 8 vCPU, 16 GB RAM and 20GB Disk for each VM)
  3. Clone https://github.com/c-scale-community/use-case-hisea
  4. Build the docker containers needed for the workflow
  5. Run each docker container, to get a feeling for what the workflow does
We have a total of 13 learners No. Name Email
1 Anastasia Zubova anastasia.zubova@deltares.nl
2 Albrecht Weerts albrecht.weerts@deltares.nl
3 Qinghua Ye Qinghua.ye@deltares.nl
4 Sanne Muis sanne.muis@deltares.nl
5 Kun Yan kun.yan@deltares.nl
6 Ruben Dahm ruben.dahm@deltares.nl
7 Bart Grasmeijer bart.grasmeijer@deltares.nl
8 Maarten van Ormondt maarten.vanormondt@deltares.nl
9 Ira Wadani ira.wardani@deltares.nl
10 Marieke Eleveld marieke.eleveld@deltares.nl
11 Rizka Akmalia rizka.akmalia@deltares.nl
12 Bert Jagers bert.jagers@deltares.nl

Please let me know next steps.

sustr4 commented 1 year ago

As for next steps all the people should enrol in the S-CALE Eval VO: https://operations-portal.egi.eu/vo/view/voname/eval.c-scale.eu Enrolment link directly available at: https://perun.egi.eu/egi/registrar/?vo=eval.c-scale.eu

A few additional questions:

  1. How do you plan to access the machines? IPv6, jump host? I can inquire about IPv4 addresses, possibly for a limited time enough could be available.

  2. Will you be needing any specific images, or will you select from vanilla?

backeb commented 1 year ago

As for next steps all the people should enrol in the S-CALE Eval VO: https://operations-portal.egi.eu/vo/view/voname/eval.c-scale.eu Enrolment link directly available at: https://perun.egi.eu/egi/registrar/?vo=eval.c-scale.eu

Ok, thanks will share that info with the participants.

A few additional questions:

  1. How do you plan to access the machines? IPv6, jump host? I can inquire about IPv4 addresses, possibly for a limited time enough could be available.

I was thinking of accessing the VMs via SSH using a key-pair, which is what we've been doing on GRNET and CloudFerro so far. So for access, lets chose whatever the C-SCALE federation is promoting as best practice.

  1. Will you be needing any specific images, or will you select from vanilla?

Vanilla? I don't think we need specific VM flavours, we can use what ever is closest to 8 vCPU, 16 GB RAM and 20GB

sebastian-luna-valero commented 1 year ago

Before enrolling the VO, please ask trainees to create an EGI account if they haven't done so.

Could you please point them to our docs: https://wiki.c-scale.eu/C-SCALE/c-scale-users/getting-started#create-a-user-account

sustr4 commented 1 year ago

VMs via SSH using a key-pair

Sure, but that's independent of IP version.

sebastian-luna-valero commented 1 year ago

Early warning: trainees should check the output of https://test-ipv6.com/ in order to go with IPv6. For example, my ISP doesn't offer IPv6 connectivity to me (maybe it's only me).

sustr4 commented 1 year ago

Before enrolling the VO, please ask trainees to create an EGI account if they haven't done so.

Hm, Delaters is listed among Check-in IdPs, but does not actually work. This would have been a nice opportunity to have it fixed.

sustr4 commented 1 year ago

So for access, lets chose whatever the C-SCALE federation is promoting as best practice.

I´d say the whole Internet community is trying to promote IPv6 as the best practice, but are your attendees capable of using it?

Exactly as @sebastian-luna-valero says. Not everyone can make it work. But if Delatres has IPv6 one premise, that's already a step forward. For all I know, remote workers may be VPNing in, and that would fix all.

backeb commented 1 year ago

I'm currently working from home using a VPN. When I follow @sebastian-luna-valero 's link in an incognito window I get

image

When I disconnect from the VPN and try again I get

image

But a while ago I wasn't getting 10/10... so I'm not sure.

Think it might be safest to use IPv4, because that's what we've been using so far...

backeb commented 1 year ago

0/10 at the office. No IPv6 detected at the Deltares office.

sustr4 commented 1 year ago

OK. I'll see if we still have enough v4 addresses to lend.

backeb commented 1 year ago

Hi @sustr4 and @sebastian-luna-valero

tomorrow @lorincmeszaros and I will be doing some prep work for our training next week.

Could you make sure that a network has been set up in your openstack so that we can do a test deployment of a VM?

The last time I did this, I had some troubles setting up the network (https://github.com/c-scale-community/use-case-hisea/issues/8#issuecomment-897622846). @sebastian-luna-valero indicated that setting up a network would be an action for the provider.

Thanks

sebastian-luna-valero commented 1 year ago

Hi,

@sustr4 should I open a GGUS ticket to enable the eval.c-scale.eu VO at CESNET?

Best regards, Sebastian

sustr4 commented 1 year ago

@sustr4 should I open a GGUS ticket to enable the eval.c-scale.eu VO at CESNET?

Hi, I'll do it locally. I'm on it right now.

backeb commented 1 year ago

Hi again @sustr4 and @sebastian-luna-valero

I've had a look at data resources now that we have most of our workflow figured out.

The storage requirements are roughly:

docker images

hydro-nb                 latest    6199a55e5a06   19 hours ago         5.02GB
postprocess              latest    c81fc3f8a5ce   About a minute ago   1.81GB
download-input           latest    0febcb1a315c   7 days ago           2GB
getera                   latest    239a17fd81cc   3 weeks ago          1.38GB
preprocessing            latest    a5b1e8fde095   3 weeks ago          1.57GB
continuumio/miniconda3   latest    ce7d119281a1   2 months ago         403MB
deltares/delft3dfm       latest    80b4d6c3dba6   6 months ago         2.86GB

Total ~15GB

data

6.5G    download
5.9M    preprocout
9.0G    modeloutput

Total ~16GB

Total Total ~31GB

So we need VMs with that storage capacity x12 (1 per user)

sustr4 commented 1 year ago

The standard for all VMs at CESNET is 80 GB. You can set up additional disks or stretch the default ones, but based the numbers I see 80 GB should be adequate.

backeb commented 1 year ago

Brilliant! 80GB is plenty.

backeb commented 1 year ago

@sustr4 do you have a VM flavour in openstack where docker comes preinstalled? or should we install it?

sebastian-luna-valero commented 1 year ago

@backeb there is one in AppDB that's already added to the eval.c-scale.eu, so you should have docker pre-installed for you.

backeb commented 1 year ago

Does that mean I can just select that flavour from openstack when I instantiate a VM?

sebastian-luna-valero commented 1 year ago

Yes, that's the goal.

sustr4 commented 1 year ago

Does that mean I can just select that flavour from openstack when I instantiate a VM?

Let me know how that goes :-)