Open dgasull opened 4 years ago
Exporting our docker containers to singularity containers has some limitations:
Our objective is to define a container that is flexible and maintainable for both, docker and singularity deployments. After analyzing it, the following changes will be done in dockerfiles:
Once dockerfiles are "singularity-friendly" then we will deploy them on MN and create a example job to start (singularity run...) and we will see how to provide to COMPSs the dataclay jar (more points to uberjar in dockers)...
I agree with everything @dgasull , I just have one comment on this:
Default user in docker is root, it means that all copied files go to "/root/dataclay...". However, in singularity this is mounted in the host system and /root/ is usually not writable
Our aim is not to make the dataclay
folder writable as this would mean making the singularity file writable with two bad consequences: (1) slow because we write in a squashfs and (2) unconfortable because singularity files should be regenerated/copied at every execution.
Create (or check if it already exists) configuration variables for the following file paths: cacheMD, infoDS, infoLM, status, execClasses, SQLite and any other path that must have write permissions
This point is actually solving the upper problem, as we don't want to write anything in the dataclay folder
The structure of singularity deployment in supercomputers will be the following:
.
├── dataclay_scripts
│ ├── start_dataclay.sh : start dataclay
│ ├── stop_dataclay.sh : gracefully stop dataclay
│ ├── clean.sh : clean dataclay logs/files...
│ └── prepare_env.sh : prepare environment scripts for singularity-compose
├── singularity-compose: singularity compose
└── dataclay_images: singularity images
├── dsjava
├── dspython
└── logicmodule
└── Singularity: needed for singularity deployment
In MN this will be located in APPS
prepare_env.sh
script will create the following files:
env.sh : used in singularity-compose, it contains all needed environemnt variables like PYTHONPATH, JAVA_HOME, DATACLAYGLOBALCONFIG, LOGICMODULE_HOST, LOGICMODULE_PORT_TCP, DATACLAY_ADMIN_USER, DATACLAY_ADMIN_PASSWORD
cfgfiles: configuration files like host and port
When a job is launched, on each node we will have:
.
├── dataclay_scripts <link>
├── singularity-compose <copied>
├── dataclay_images <link>
├── cfgfiles <generated>
├── env.sh <generated>
└── ...
And on the client-side, a similar structure for a singularity demo (which will be created in dataclay-demos repository):
.
├── apps
├── model
├── singularity-job
└── ...
@alexbarcelo agree?
I think that the general pattern regarding the server/services stuff is adequate (scripts, orchestration stuff).
For the general "supercomputer Singularity deployment", that's it, we're done.
For Mare Nostrum, and for COMPSs integration, we need to also provide client things somehow, because PyCOMPSs will need to use dataClay Python bindings (and, I assume, the same will go for Java).
For future-proofing the system, we need to consider how to use extra libraries / user-defined PYTHONPATH / additional requirements on the application side. I believe that @pierlauro was convinced of having a technical solution for that, but I haven't seen it in this thread (am I wrong?).
For future-proofing the system, we need to consider how to use extra libraries / user-defined PYTHONPATH / additional requirements on the application side. I believe that @pierlauro was convinced of having a technical solution for that, but I haven't seen it in this thread (am I wrong?).
Simply prepending additional libs to containers' PYTHONPATH (pointing to the internal venv) should work. We still need to try that though.
Simply prepending additional libs to containers' PYTHONPATH (pointing to the internal venv) should work. We still need to try that though
Regarding PYTHONPATH, it works as expected.
To make scripts "COMPSs-friendly" in MN we defined the following scripts structure:
start_dataclay.sh
: it is called from COMPSs storage_init.sh
and from a job. It calls prepare_environment.sh
and singularity-compose up
. @alexbarcelo if you need any extra environment variable, we need to add it here. - start_dataclay.sh <jobId> <lm_node> <ds_nodes> <num_ees_per_node> <storage_path> <debug things> <tracing> ...
stop_dataclay.sh
: it is called from COMPSs storage_stop.sh
and from jobs. It calls singularity-compose down
in a graceful way. - stop_dataclay.sh <jobId> ...
storage.properties
, session.properties
, global.properties
, log4j2.xml
... and other configurations (like tracing) affecting the client application (matmul, wordcount...) will be part of the Job or enqueue_compss
@alexbarcelo once those scripts are done, it will be nice if you can change storage_init.sh
and storage_stop.sh
to make them call dataclay_start.sh
and dataclay_stop.sh
and check if you need something else.
LGTM
@pierlauro and @dgasull need to modify the following in singularity scripts:
for ssh...
Deployment of dataClay using singularity is working, we need to modify the following:
We should also append the correct PYTHONPATH, LD_LIBRARY_PATH and PATH in one the env files mounted from python EEs.
@alexbarcelo and @dgasull check together how to use storage_init.sh with new deploy_dataclay.sh
@alexbarcelo will create a demo app using enqueue compss and place it in MN /apps/DATACLAY/2.1
(please, include your storage props file with EE_per_NODE and so on). Also, DATACLAY_JAR and PYTHONPATH are the env. variables needed right?
For the rest is missing (@pierlauro):
configure depends_on on singularity compose files for stopping them
In case of multiple python data services, shall they depend on the first java data service of the same node?
In case of multiple python data services, shall they depend on the first java data service of the same node?
Yes, multiple Python Execution Environments will depend on a single Storage Location, which is the java data service.
When you say first java data service, are you implying that there can be more than one? If that's the scenario, it is not defined at all --and I would argue that if there are more than one data service == storage locations then things should be round-robin-ed, but that is an unexisting use case for the moment, is it?
That's exactly what I was assuming, perfect!
The script that generates singularity compose is generic and takes number of java data services and number of python ones. In reality when there are more python ones the use case just requires one java.
For now, let's keep one java DS per node (also in nodes with python) until there's a use case in Java that needs multiple DS and threading is not enough. So yes, @pierlauro they depend on the first (and unique) java ds
Implement deployment in supercomputers using Singularity containers