Singularity deployment - Githubissues

dgasull commented 4 years ago

Implement deployment in supercomputers using Singularity containers

dgasull commented 4 years ago

Exporting our docker containers to singularity containers has some limitations:

Environment variables defined in Dockerfiles are lost in a singularity container: take into account that just environment variables being used by the entrypoint are important here
Default user in docker is root, it means that all copied files go to "/root/dataclay...". However, in singularity this is mounted in the host system and /root/ is usually not writable

Our objective is to define a container that is flexible and maintainable for both, docker and singularity deployments. After analyzing it, the following changes will be done in dockerfiles:

[X] Change docker user to another that is not root (dataclay user)
[X] Create (or check if it already exists) configuration variables for the following file paths: cacheMD, infoDS, infoLM, status, execClasses, SQLite and any other path that must have write permissions
[x] Create a new entrypoint without maven (only Java) since we are not sure that maven will not try to download extra dependencies and fail in MN due to the lack of internet connection -- this entrypoint will ease the work of joining dataclay and compss docker containers, and maybe also allow us to evaluate if uberjars are better for reducing docker container sizes
[X] Install extrae for dataclay in different paths than the one in MN (this can be applied to any third-party library installed in our container)
[X] All environment variables used in entrypoints must be defined in a script (maybe called env.sh) so singularity can apply them.
[X] Move python virtual environment to another location in container (not /opt/venv/) -- also PYTHONPATH must be modified in python entrypoint (here is also the place to decide how to use Python MN libraries...)

Once dockerfiles are "singularity-friendly" then we will deploy them on MN and create a example job to start (singularity run...) and we will see how to provide to COMPSs the dataclay jar (more points to uberjar in dockers)...

pierlauro commented 4 years ago

I agree with everything @dgasull , I just have one comment on this:

Default user in docker is root, it means that all copied files go to "/root/dataclay...". However, in singularity this is mounted in the host system and /root/ is usually not writable

Our aim is not to make the dataclay folder writable as this would mean making the singularity file writable with two bad consequences: (1) slow because we write in a squashfs and (2) unconfortable because singularity files should be regenerated/copied at every execution.

Create (or check if it already exists) configuration variables for the following file paths: cacheMD, infoDS, infoLM, status, execClasses, SQLite and any other path that must have write permissions

This point is actually solving the upper problem, as we don't want to write anything in the dataclay folder

dgasull commented 4 years ago

The structure of singularity deployment in supercomputers will be the following:

.
├── dataclay_scripts 
│   ├── start_dataclay.sh : start dataclay 
│   ├── stop_dataclay.sh : gracefully stop dataclay 
│   ├── clean.sh : clean dataclay logs/files...
│   └── prepare_env.sh : prepare environment scripts for singularity-compose
├── singularity-compose: singularity compose
└── dataclay_images: singularity images
    ├── dsjava
    ├── dspython 
    └── logicmodule
    └── Singularity: needed for singularity deployment

In MN this will be located in APPS

prepare_env.sh script will create the following files:

env.sh : used in singularity-compose, it contains all needed environemnt variables like PYTHONPATH, JAVA_HOME, DATACLAYGLOBALCONFIG, LOGICMODULE_HOST, LOGICMODULE_PORT_TCP, DATACLAY_ADMIN_USER, DATACLAY_ADMIN_PASSWORD
cfgfiles: configuration files like host and port

When a job is launched, on each node we will have:

.
├── dataclay_scripts <link>
├── singularity-compose <copied>
├── dataclay_images <link>
├── cfgfiles <generated>
├── env.sh <generated>
└── ...

And on the client-side, a similar structure for a singularity demo (which will be created in dataclay-demos repository):

.
├── apps
├── model
├── singularity-job
└── ...

@alexbarcelo agree?

alexbarcelo commented 4 years ago

I think that the general pattern regarding the server/services stuff is adequate (scripts, orchestration stuff).

For the general "supercomputer Singularity deployment", that's it, we're done.

For Mare Nostrum, and for COMPSs integration, we need to also provide client things somehow, because PyCOMPSs will need to use dataClay Python bindings (and, I assume, the same will go for Java).

For future-proofing the system, we need to consider how to use extra libraries / user-defined PYTHONPATH / additional requirements on the application side. I believe that @pierlauro was convinced of having a technical solution for that, but I haven't seen it in this thread (am I wrong?).

pierlauro commented 4 years ago

For future-proofing the system, we need to consider how to use extra libraries / user-defined PYTHONPATH / additional requirements on the application side. I believe that @pierlauro was convinced of having a technical solution for that, but I haven't seen it in this thread (am I wrong?).

Simply prepending additional libs to containers' PYTHONPATH (pointing to the internal venv) should work. We still need to try that though.

pierlauro commented 4 years ago

Simply prepending additional libs to containers' PYTHONPATH (pointing to the internal venv) should work. We still need to try that though

Regarding PYTHONPATH, it works as expected.

dgasull commented 4 years ago

To make scripts "COMPSs-friendly" in MN we defined the following scripts structure:

Script start_dataclay.sh: it is called from COMPSs storage_init.sh and from a job. It calls prepare_environment.sh and singularity-compose up. @alexbarcelo if you need any extra environment variable, we need to add it here.

- start_dataclay.sh <jobId> <lm_node> <ds_nodes> <num_ees_per_node> <storage_path> <debug things> <tracing> ...

Script stop_dataclay.sh: it is called from COMPSs storage_stop.sh and from jobs. It calls singularity-compose down in a graceful way.

- stop_dataclay.sh <jobId> ...

storage.properties, session.properties, global.properties, log4j2.xml... and other configurations (like tracing) affecting the client application (matmul, wordcount...) will be part of the Job or enqueue_compss

@alexbarcelo once those scripts are done, it will be nice if you can change storage_init.sh and storage_stop.sh to make them call dataclay_start.sh and dataclay_stop.sh and check if you need something else.

alexbarcelo commented 4 years ago

LGTM

dgasull commented 4 years ago

@pierlauro and @dgasull need to modify the following in singularity scripts:

client folder creation for job (to allow client executions like singularity runs)
script to deploy dataclay without doing a for ssh...
modify env. variables in scripts to point to /APPS/DATACLAT/v2.1 (static? env. variable?)

dgasull commented 4 years ago

Deployment of dataClay using singularity is working, we need to modify the following:

[ ] Service logs can be unified and visible from job output (via PR to singularity or in our way)
[x] Depends_on in singularity
[ ] Modify scripts deploy-dataclay to be used from storage init
[ ] Singularity demo

pierlauro commented 4 years ago

We should also append the correct PYTHONPATH, LD_LIBRARY_PATH and PATH in one the env files mounted from python EEs.

dgasull commented 4 years ago

@alexbarcelo and @dgasull check together how to use storage_init.sh with new deploy_dataclay.sh

dgasull commented 4 years ago

@alexbarcelo will create a demo app using enqueue compss and place it in MN /apps/DATACLAY/2.1 (please, include your storage props file with EE_per_NODE and so on). Also, DATACLAY_JAR and PYTHONPATH are the env. variables needed right?

For the rest is missing (@pierlauro):

[x] update PYTHONPATH in pyclay Dockerfile to include dataclay so no need to add to MN job path explicitly
[x] configure depends_on on singularity compose files for stopping them
[x] configure and test debugging
[x] configure and test tracing (also in COMPSs)
[ ] configure and test logging
[x] create an hpc java demo
[x] create a job that restarts dataclay
[ ] automatize singularity builds (nightly)
[ ] publish demos and code in GitHub

pierlauro commented 4 years ago

configure depends_on on singularity compose files for stopping them

In case of multiple python data services, shall they depend on the first java data service of the same node?

alexbarcelo commented 4 years ago

In case of multiple python data services, shall they depend on the first java data service of the same node?

Yes, multiple Python Execution Environments will depend on a single Storage Location, which is the java data service.

When you say first java data service, are you implying that there can be more than one? If that's the scenario, it is not defined at all --and I would argue that if there are more than one data service == storage locations then things should be round-robin-ed, but that is an unexisting use case for the moment, is it?

pierlauro commented 4 years ago

That's exactly what I was assuming, perfect!

The script that generates singularity compose is generic and takes number of java data services and number of python ones. In reality when there are more python ones the use case just requires one java.

dgasull commented 4 years ago

For now, let's keep one java DS per node (also in nodes with python) until there's a use case in Java that needs multiple DS and threading is not enough. So yes, @pierlauro they depend on the first (and unique) java ds

bsc-dom / dataclay-packaging

Singularity deployment #2