Closed nikosT closed 2 years ago
@nikosT In general the hisea use case workflow consists of the following docker images: pre-processing = creating model boundary conditions [in the Cloud], running the model [in the Cloud - could be also on HPC at a later stage], and post-processing = transforming model outputs into another format [n the Cloud]. The use case requirements document for the hise use case seems to be outdated, e.g. we will not have a viewer (front-end) component only the back-end, and we will not need satellite data.
Requirements will be listed here: https://github.com/c-scale-community/use-case-hisea/issues/3
Keeping the above in mind, please find below some remarks on your questions:
On Data requirements:
1. Which of these data refer to the Cloud and which of these to the HPC infrastructure? All listed data refers to the Cloud and will be used for pre-processing. The necessary data for model running (in cloud or on the HPC infrastructure) will be the output of the pre-processing step. Therefore if the pre-processing outputs eed to be accessible by the model running infrastructure. We will only need GLOBAL_ANALYSISFORECAST* and ECMWF ERA5 hindcast products. None of the satellite products are needed.
2. Is there a mechanism that ingests the data (for instance in the Real-Time case)? The pre-processing docker image is responsible for the download of the data using MOTU client (https://github.com/clstoulouse/motu-client-python) for CMEMS data, and cdsapi (https://cds.climate.copernicus.eu/api-how-to) for ECMWF ERA5 data. Please note that FES2012 tidal component data is also used (https://www.aviso.altimetry.fr/es/data/products/auxiliary-products/global-tide-fes/description-fes2012.html). This data is not downloaded operationally, instead it is stored as static data, After all data is downloaded, it gets converted into the model boundary format.
3. Which is the estimated data size for both Real-Time and Historical data? Data types are hindast = historical (~5 days) and forecast (~2 days). The size of CMEMS and ECMWF ERA5 data are small few Megabytes. The FES2012 data is ~4GB but it is not downloaded operationally.
On Code base:
1. Are these utilities for Cloud and/or HPC? Could you specify? For instance, we do not support PostgreSQL on our HPC. At this stage we prefer to deploy everything on Cloud infrastructure. Later tests could include the use of HPC but not at this stage.
2. Could you also post informative links to the not well-known tools? The requirements need to be updated. Link to some tools: Pre-processing: https://github.com/openearth/coast-serv-back-end Model: https://www.deltares.nl/en/software/delft3d-flexible-mesh-suite/ Post-processing: https://github.com/openearth/dfm_tools
On Resource requirements:
**1. Does the server and/or the viewer refer to Cloud and/or HPC? Could you specify?
The requirements need to be updated.
On Estimated capacity requirements:
**1. Are you referring to static storage of 10 TB? Is it monthly aggregated (i.e. first month: 10 TB, second month: 20 TB etc.)? Aggregated is not a feasible option for us.
The requirements need to be updated.
We need to test HPC as part of this project. Singularity is available on GRNET HPC. @lorincmeszaros @avgils keep this in mind. Need to figure out how to compile Delft3D FM on GRNET HPC - Singularity is preferred. GRNET can help with compiling to optimize on HPC. Idea: Deploy on cloud, and as Singularity on HPC and compile directly on GRNET HPC to test performance between 3. Preprocessing (run on cloud) creates input files for model (run on HPC).
Concerning the requirements as described in the document https://docs.google.com/document/d/1hjyUsA2g7SYV_XUGiP6vCeIFu79WWrC6xHHWc9NcoQo/edit, the following issues and clarifications need to be resolved:
On Data requirements:
On Code base:
On Resource requirements:
On Estimated capacity requirements: