bsc-dom / dataclay-packaging

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Singularity deployment #2

Open dgasull opened 4 years ago

dgasull commented 4 years ago

Implement deployment in supercomputers using Singularity containers

dgasull commented 4 years ago

Exporting our docker containers to singularity containers has some limitations:

Our objective is to define a container that is flexible and maintainable for both, docker and singularity deployments. After analyzing it, the following changes will be done in dockerfiles:

Once dockerfiles are "singularity-friendly" then we will deploy them on MN and create a example job to start (singularity run...) and we will see how to provide to COMPSs the dataclay jar (more points to uberjar in dockers)...

pierlauro commented 4 years ago

I agree with everything @dgasull , I just have one comment on this:

Default user in docker is root, it means that all copied files go to "/root/dataclay...". However, in singularity this is mounted in the host system and /root/ is usually not writable

Our aim is not to make the dataclay folder writable as this would mean making the singularity file writable with two bad consequences: (1) slow because we write in a squashfs and (2) unconfortable because singularity files should be regenerated/copied at every execution.

Create (or check if it already exists) configuration variables for the following file paths: cacheMD, infoDS, infoLM, status, execClasses, SQLite and any other path that must have write permissions

This point is actually solving the upper problem, as we don't want to write anything in the dataclay folder

dgasull commented 4 years ago

The structure of singularity deployment in supercomputers will be the following:

.
├── dataclay_scripts 
│   ├── start_dataclay.sh : start dataclay 
│   ├── stop_dataclay.sh : gracefully stop dataclay 
│   ├── clean.sh : clean dataclay logs/files...
│   └── prepare_env.sh : prepare environment scripts for singularity-compose
├── singularity-compose: singularity compose
└── dataclay_images: singularity images
    ├── dsjava
    ├── dspython 
    └── logicmodule
    └── Singularity: needed for singularity deployment

In MN this will be located in APPS

prepare_env.sh script will create the following files:

When a job is launched, on each node we will have:

.
├── dataclay_scripts <link>
├── singularity-compose <copied>
├── dataclay_images <link>
├── cfgfiles <generated>
├── env.sh <generated>
└── ...

And on the client-side, a similar structure for a singularity demo (which will be created in dataclay-demos repository):

.
├── apps
├── model
├── singularity-job
└── ...

@alexbarcelo agree?

alexbarcelo commented 4 years ago

I think that the general pattern regarding the server/services stuff is adequate (scripts, orchestration stuff).

For the general "supercomputer Singularity deployment", that's it, we're done.

For Mare Nostrum, and for COMPSs integration, we need to also provide client things somehow, because PyCOMPSs will need to use dataClay Python bindings (and, I assume, the same will go for Java).

For future-proofing the system, we need to consider how to use extra libraries / user-defined PYTHONPATH / additional requirements on the application side. I believe that @pierlauro was convinced of having a technical solution for that, but I haven't seen it in this thread (am I wrong?).

pierlauro commented 4 years ago

For future-proofing the system, we need to consider how to use extra libraries / user-defined PYTHONPATH / additional requirements on the application side. I believe that @pierlauro was convinced of having a technical solution for that, but I haven't seen it in this thread (am I wrong?).

Simply prepending additional libs to containers' PYTHONPATH (pointing to the internal venv) should work. We still need to try that though.

pierlauro commented 4 years ago

Simply prepending additional libs to containers' PYTHONPATH (pointing to the internal venv) should work. We still need to try that though

Regarding PYTHONPATH, it works as expected.

dgasull commented 4 years ago

To make scripts "COMPSs-friendly" in MN we defined the following scripts structure:

  1. Script start_dataclay.sh: it is called from COMPSs storage_init.sh and from a job. It calls prepare_environment.sh and singularity-compose up. @alexbarcelo if you need any extra environment variable, we need to add it here.
- start_dataclay.sh <jobId> <lm_node> <ds_nodes> <num_ees_per_node> <storage_path> <debug things> <tracing> ... 
  1. Script stop_dataclay.sh: it is called from COMPSs storage_stop.sh and from jobs. It calls singularity-compose down in a graceful way.
- stop_dataclay.sh <jobId> ... 

storage.properties, session.properties, global.properties, log4j2.xml... and other configurations (like tracing) affecting the client application (matmul, wordcount...) will be part of the Job or enqueue_compss

@alexbarcelo once those scripts are done, it will be nice if you can change storage_init.sh and storage_stop.sh to make them call dataclay_start.sh and dataclay_stop.sh and check if you need something else.

alexbarcelo commented 4 years ago

LGTM

dgasull commented 4 years ago

@pierlauro and @dgasull need to modify the following in singularity scripts:

dgasull commented 4 years ago

Deployment of dataClay using singularity is working, we need to modify the following:

pierlauro commented 4 years ago

We should also append the correct PYTHONPATH, LD_LIBRARY_PATH and PATH in one the env files mounted from python EEs.

dgasull commented 4 years ago

@alexbarcelo and @dgasull check together how to use storage_init.sh with new deploy_dataclay.sh

dgasull commented 4 years ago

@alexbarcelo will create a demo app using enqueue compss and place it in MN /apps/DATACLAY/2.1 (please, include your storage props file with EE_per_NODE and so on). Also, DATACLAY_JAR and PYTHONPATH are the env. variables needed right?

For the rest is missing (@pierlauro):

pierlauro commented 4 years ago

configure depends_on on singularity compose files for stopping them

In case of multiple python data services, shall they depend on the first java data service of the same node?

alexbarcelo commented 4 years ago

In case of multiple python data services, shall they depend on the first java data service of the same node?

Yes, multiple Python Execution Environments will depend on a single Storage Location, which is the java data service.

When you say first java data service, are you implying that there can be more than one? If that's the scenario, it is not defined at all --and I would argue that if there are more than one data service == storage locations then things should be round-robin-ed, but that is an unexisting use case for the moment, is it?

pierlauro commented 4 years ago

That's exactly what I was assuming, perfect!

The script that generates singularity compose is generic and takes number of java data services and number of python ones. In reality when there are more python ones the use case just requires one java.

dgasull commented 4 years ago

For now, let's keep one java DS per node (also in nodes with python) until there's a use case in Java that needs multiple DS and threading is not enough. So yes, @pierlauro they depend on the first (and unique) java ds