giovtorres / slurm-docker-cluster

A Slurm cluster using docker-compose
MIT License
319 stars 188 forks source link

Loading archive data to slurmdb in slurmdbd container #31

Closed mdefende closed 2 months ago

mdefende commented 1 year ago

Hi,

Firstly, thanks for creating the docker images for Slurm, it was much easier to set up this way. The main reason I wanted to set up Slurm is to read in some job archive data from my university's cluster for some metrics. I went through your instructions and everything ran well, I was able to copy the archive data to the slurmdbd container. I then accessed it using docker exec -it slurmdbd bash. From there, I tried to load the archive data using the command sacctmgr archive load file=/data/slurm_archive. However, this gave me the following error:

error: slurmdbd: Error with request.  
Problem loading archive file: Permission denied

I'm running as root so I didn't figure there would be any permission errors. Just wondering if there was some setup I was missing if you knew anything offhand. Thanks

giovtorres commented 1 year ago

Are you by chance running the container on a host with SELinux enabled? If so, you will want to mount the volume with the archive data with the :z flag.

mdefende commented 1 year ago

I do not have any SELinux packages installed on the host so I don't think that would be a problem. Just to give a bit more explanation, I ran the following steps:

  1. Cloned the repo and built the containers using docker build --build-arg SLURM_TAG="slurm-18-08-6-1" -t slurm-docker-cluster:18.08.6 .
  2. Ran IMAGE_TAG=18.08.6 docker-compose up -d
  3. Ran ./register_cluster.sh
  4. I copied the archive file to the container using docker cp slurm_archive slurmdbd:/data/slurm_archive.
  5. I accessed the slurmdbd container using docker exec -it slurmdbd bash. I can see the archive file was successfully transferred to the /data directory in the container
  6. I tried to load the archive file using sacctmgr archive load file=/data/slurm_archive which gave the error above

The host VM I'm running is Ubuntu 22.04 if that helps

giovtorres commented 2 months ago

Sorry for the delayed response. I'm not familiar with loading archives with sacctmgr. You could check the logs and see if anything pops up. You could try running the sacctmgr prefixed with strace -ff -e open and see what it is trying to opening and can't. Then you could look at permissions for that file.

If you have solved for this, please feel free to drop an answer here for future reference. Thanks.

mdefende commented 2 months ago

Thanks for following up. I wasn't able to read in the slurmdb dumps using the docker containers, but I was able to eventually use a VM cluster setup that had Slurm installed and read the dumps in there. I may come back around to trying it with your Docker containers at some point since that would be more straightforward, but not for a while.