Closed ChiaraVanni closed 1 year ago
Hi,
good catch! Thanks to your issue I noticed that we haven't documented ephemeral yet. I will catch up on that soon.
As a first short answer: Maybe your ephemerals got detected correctly, but you are looking at the wrong spot? On your master ephemeral is mounted to /vol/
and on your workers to /vol/scratch
. We agreed to copy the behavior of the old BiBiGrid at this point. However, I am not quite sure why the old bibigrid used two different paths and need to check with Jan.
If that's not it, we need to dig a little deeper:
bibigrid/resources/group_vars
files what's stored under
flavor:
ephemeral: ?
If this is 0 than something is wrong with the assignment. In that case a manual fix would be to store the available ephemeral there (should be 1950000). Otherwise something is wrong with the Ansible execution.
TASK [bibigrid : Create /vol bind mount from /mnt ephemeral]
has been skipped. You can check this in the logfile or - in every case - run Ansible again on your remote machine: when logged into your remote machine, execute bibiplay -l master
.Xaver
Hi,
in the bibigrid/resources/group_vars
files I have the ephemerals
cloud_identifier: openstack
flavor:
disk: 50
ephemeral: 2000
name: de.NBI highmem xlarge
ram: 491520
vcpus: 28
gateway_ip: 192.168.0.235
image: Ubuntu 22.04 LTS (2023-02-14)
name: bibigrid-worker0-cv6j7rvcuvf1ifr-[0-4]
network: 74074d13-4d59-469a-8d3c-978097938952
regexp: bibigrid-worker0-cv6j7rvcuvf1ifr-\d+
When running bibiplay -l master
I get this:
TASK [bibigrid : Create /vol bind mount from /mnt ephemeral] *********************************************************************************************************************************************************
ok: [localhost]
TASK [bibigrid : Mount disks] ****************************************************************************************************************************************************************************************
skipping: [localhost]
TASK [bibigrid : Mount ephemeral] ************************************************************************************************************************************************************************************
skipping: [localhost]
TASK [bibigrid : Set 0777 rights for ephemeral mount] ****************************************************************************************************************************************************************
skipping: [localhost]
My ansible_hosts
file looks like this:
vpn:
children:
master:
hosts:
localhost:
ansible_connection: ssh
ansible_python_interpreter: /usr/bin/python3
ansible_user: ubuntu
ip: localhost
vpngtw:
hosts: {}
hosts: {}
workers:
children:
bibigrid_worker0_cv6j7rvcuvf1ifr_0_4:
hosts:
bibigrid-worker0-cv6j7rvcuvf1ifr-[0:4]:
ansible_connection: ssh
ansible_python_interpreter: /usr/bin/python3
ansible_user: ubuntu
hosts: {}
Thanks!
Chiara
And this the common_configuration.yml
file, in case it can help:
cluster_cidrs:
- cloud_identifier: openstack
provider_cidrs: 192.168.0.0/24
cluster_id: cv6j7rvcuvf1ifr
default_user: 6736068f914d51cb40197bfee384425a070b3067@elixir-europe.org
dns_server_list:
- 8.8.8.8
enable_ide: true
enable_nfs: true
enable_slurm: false
enable_zabbix: true
ext_nfs_mounts: []
ide_conf:
build: false
ide: false
port_end: 8383
port_start: 8181
workspace: ${HOME}
local_dns_lookup: false
local_fs: false
nfs_mounts:
- dst: //vol/spool
src: //vol/spool
slurm: true
slurm_conf:
db: slurm
db_password: changeme
db_user: slurm
elastic_scheduling:
ResumeTimeout: 900
SuspendTime: 3600
TreeWidth: 128
munge_key: JMqd42ofbcoi9o2G8fDZfNT7F9TcN6sn
ssh_user: ubuntu
use_master_as_compute: true
wait_for_services:
- de.NBI_Bielefeld_environment.service
zabbix_conf:
admin_password: bibigrid
db: zabbix
db_password: zabbix
db_user: zabbix
server_name: bibigrid
timezone: Europe/Berlin
This sounds like everything is working as expected - even though I am a bit surprised about the EDIT: It's correct. Just mixed up the flavors.
The skipped steps are steps that are only executed on workers (we just ran Ansible for the master).
So, the ephemeral should be mounted to ephemeral: 2000
... Too low./vol/
at the master and /vol/scratch
at the workers.
I can't test (& confirm) this at the moment, but I will as soon as possible.
Alright, I confirmed what I stated above. Most likely it is mounted to /vol/
(on the master). You can check this by executing: mount | grep "/vol"
you should see something like: /dev/vdb on /vol type ext4 (rw,relatime)
. If you get an empty output, something went wrong. And for the workers it should return something like:
/dev/vdb on /vol/scratch type ext4 (rw,relatime) # mounted ephemeral
192.168.200.98:/vol/spool on /vol/spool type nfs4 (rw,relatime,[...]) # nfsshare from master
Hi,
Sorry for the late reply. Yes, it is mounted (on /vol
in the master and /vol/scratch
in the workers). I was expecting it on /vol/scratch
in the master as well because this is what I have in another cloud.
Thank you for the fast help!!
Chiara
Perfect! If no other issue occurred - and I didn't miss anything - all issues are resolved, right? In that case I am going to close this issue.
Hi, yes all issues are solved thanks!
Hi,
I recently create a cluster with bibigrid and everything ran smoothly, only I don't see any "/vol/scratch" or ephemeral disk. I didn't explicitly specify this in the bibigrid.yml file (here attached). Is there a way to set the local ephemeral disk with the cluster up and running? If not, where should I specify that I want an ephemeral disk in the bibigrid.yml config. file? Many thanks for the help!
Chiara bibigrid.txt