c-scale-community / workflow-coastal-hydrowaq

Porting and deploying the HiSea use case on C-SCALE
Apache License 2.0
3 stars 1 forks source link

Use Case Requirements #2

Closed nikosT closed 2 years ago

nikosT commented 3 years ago

Concerning the requirements as described in the document https://docs.google.com/document/d/1hjyUsA2g7SYV_XUGiP6vCeIFu79WWrC6xHHWc9NcoQo/edit, the following issues and clarifications need to be resolved:

On Data requirements:

  1. Which of these data refer to the Cloud and which of these to the HPC infrastructure?
  2. Is there a mechanism that ingests the data (for instance in the Real-Time case)?
  3. Which is the estimated data size for both Real-Time and Historical data?

On Code base:

  1. Are these utilities for Cloud and/or HPC? Could you specify? For instance, we do not support PostgreSQL on our HPC.
  2. Could you also post informative links to the not well-known tools?

On Resource requirements:

  1. Does the server and/or the viewer refer to Cloud and/or HPC? Could you specify?
  2. Where are the data hosted?
  3. Does the production and the development system refer to CPU, and/or Storage, and/or HPC resources?
  4. In which infrastructure the fast storage is mentioned? For example in Cloud, fast is not feasible.

On Estimated capacity requirements:

  1. Are you referring to static storage of 10 TB? Is it monthly aggregated (i.e. first month: 10 TB, second month: 20 TB etc.)? Aggregated is not a feasible option for us.
  2. On HPC, usable RAM Per Node is 56 GB (not 64).
lorincmeszaros commented 3 years ago

@nikosT In general the hisea use case workflow consists of the following docker images: pre-processing = creating model boundary conditions [in the Cloud], running the model [in the Cloud - could be also on HPC at a later stage], and post-processing = transforming model outputs into another format [n the Cloud]. The use case requirements document for the hise use case seems to be outdated, e.g. we will not have a viewer (front-end) component only the back-end, and we will not need satellite data.

Requirements will be listed here: https://github.com/c-scale-community/use-case-hisea/issues/3

Keeping the above in mind, please find below some remarks on your questions:

On Data requirements:

1. Which of these data refer to the Cloud and which of these to the HPC infrastructure? All listed data refers to the Cloud and will be used for pre-processing. The necessary data for model running (in cloud or on the HPC infrastructure) will be the output of the pre-processing step. Therefore if the pre-processing outputs eed to be accessible by the model running infrastructure. We will only need GLOBAL_ANALYSISFORECAST* and ECMWF ERA5 hindcast products. None of the satellite products are needed.

2. Is there a mechanism that ingests the data (for instance in the Real-Time case)? The pre-processing docker image is responsible for the download of the data using MOTU client (https://github.com/clstoulouse/motu-client-python) for CMEMS data, and cdsapi (https://cds.climate.copernicus.eu/api-how-to) for ECMWF ERA5 data. Please note that FES2012 tidal component data is also used (https://www.aviso.altimetry.fr/es/data/products/auxiliary-products/global-tide-fes/description-fes2012.html). This data is not downloaded operationally, instead it is stored as static data, After all data is downloaded, it gets converted into the model boundary format.

3. Which is the estimated data size for both Real-Time and Historical data? Data types are hindast = historical (~5 days) and forecast (~2 days). The size of CMEMS and ECMWF ERA5 data are small few Megabytes. The FES2012 data is ~4GB but it is not downloaded operationally.

On Code base:

1. Are these utilities for Cloud and/or HPC? Could you specify? For instance, we do not support PostgreSQL on our HPC. At this stage we prefer to deploy everything on Cloud infrastructure. Later tests could include the use of HPC but not at this stage.

2. Could you also post informative links to the not well-known tools? The requirements need to be updated. Link to some tools: Pre-processing: https://github.com/openearth/coast-serv-back-end Model: https://www.deltares.nl/en/software/delft3d-flexible-mesh-suite/ Post-processing: https://github.com/openearth/dfm_tools

On Resource requirements:

**1. Does the server and/or the viewer refer to Cloud and/or HPC? Could you specify?

  1. Where are the data hosted?
  2. Does the production and the development system refer to CPU, and/or Storage, and/or HPC resources?
  3. In which infrastructure the fast storage is mentioned? For example in Cloud, fast is not feasible.**

The requirements need to be updated.

On Estimated capacity requirements:

**1. Are you referring to static storage of 10 TB? Is it monthly aggregated (i.e. first month: 10 TB, second month: 20 TB etc.)? Aggregated is not a feasible option for us.

  1. On HPC, usable RAM Per Node is 56 GB (not 64).**

The requirements need to be updated.

backeb commented 2 years ago

We need to test HPC as part of this project. Singularity is available on GRNET HPC. @lorincmeszaros @avgils keep this in mind. Need to figure out how to compile Delft3D FM on GRNET HPC - Singularity is preferred. GRNET can help with compiling to optimize on HPC. Idea: Deploy on cloud, and as Singularity on HPC and compile directly on GRNET HPC to test performance between 3. Preprocessing (run on cloud) creates input files for model (run on HPC).