BiBiServ / bibigrid

BiBiGrid is a tool for an easy cluster setup inside a cloud environment.
Apache License 2.0
11 stars 8 forks source link

Ephemeral disk not found #416

Closed ChiaraVanni closed 1 year ago

ChiaraVanni commented 1 year ago

Hi,

I recently create a cluster with bibigrid and everything ran smoothly, only I don't see any "/vol/scratch" or ephemeral disk. I didn't explicitly specify this in the bibigrid.yml file (here attached). Is there a way to set the local ephemeral disk with the cluster up and running? If not, where should I specify that I want an ephemeral disk in the bibigrid.yml config. file? Many thanks for the help!

Chiara bibigrid.txt

XaverStiensmeier commented 1 year ago

Hi,

good catch! Thanks to your issue I noticed that we haven't documented ephemeral yet. I will catch up on that soon.

As a first short answer: Maybe your ephemerals got detected correctly, but you are looking at the wrong spot? On your master ephemeral is mounted to /vol/ and on your workers to /vol/scratch. We agreed to copy the behavior of the old BiBiGrid at this point. However, I am not quite sure why the old bibigrid used two different paths and need to check with Jan.

If that's not it, we need to dig a little deeper:

  1. In the bibigrid/resources/group_vars files what's stored under
    flavor:
    ephemeral: ?

If this is 0 than something is wrong with the assignment. In that case a manual fix would be to store the available ephemeral there (should be 1950000). Otherwise something is wrong with the Ansible execution.

  1. If this wasn't 0 then please check whether TASK [bibigrid : Create /vol bind mount from /mnt ephemeral] has been skipped. You can check this in the logfile or - in every case - run Ansible again on your remote machine: when logged into your remote machine, execute bibiplay -l master.

Xaver

ChiaraVanni commented 1 year ago

Hi, in the bibigrid/resources/group_vars files I have the ephemerals

cloud_identifier: openstack
flavor:
  disk: 50
  ephemeral: 2000
  name: de.NBI highmem xlarge
  ram: 491520
  vcpus: 28
gateway_ip: 192.168.0.235
image: Ubuntu 22.04 LTS (2023-02-14)
name: bibigrid-worker0-cv6j7rvcuvf1ifr-[0-4]
network: 74074d13-4d59-469a-8d3c-978097938952
regexp: bibigrid-worker0-cv6j7rvcuvf1ifr-\d+

When running bibiplay -l master I get this:

TASK [bibigrid : Create /vol bind mount from /mnt ephemeral] *********************************************************************************************************************************************************
ok: [localhost]

TASK [bibigrid : Mount disks] ****************************************************************************************************************************************************************************************
skipping: [localhost]

TASK [bibigrid : Mount ephemeral] ************************************************************************************************************************************************************************************
skipping: [localhost]

TASK [bibigrid : Set 0777 rights for ephemeral mount] ****************************************************************************************************************************************************************
skipping: [localhost]

My ansible_hosts file looks like this:

vpn:
  children:
    master:
      hosts:
        localhost:
          ansible_connection: ssh
          ansible_python_interpreter: /usr/bin/python3
          ansible_user: ubuntu
          ip: localhost
    vpngtw:
      hosts: {}
  hosts: {}
workers:
  children:
    bibigrid_worker0_cv6j7rvcuvf1ifr_0_4:
      hosts:
        bibigrid-worker0-cv6j7rvcuvf1ifr-[0:4]:
          ansible_connection: ssh
          ansible_python_interpreter: /usr/bin/python3
          ansible_user: ubuntu
  hosts: {}

Thanks!

Chiara

ChiaraVanni commented 1 year ago

And this the common_configuration.yml file, in case it can help:

cluster_cidrs:
- cloud_identifier: openstack
  provider_cidrs: 192.168.0.0/24
cluster_id: cv6j7rvcuvf1ifr
default_user: 6736068f914d51cb40197bfee384425a070b3067@elixir-europe.org
dns_server_list:
- 8.8.8.8
enable_ide: true
enable_nfs: true
enable_slurm: false
enable_zabbix: true
ext_nfs_mounts: []
ide_conf:
  build: false
  ide: false
  port_end: 8383
  port_start: 8181
  workspace: ${HOME}
local_dns_lookup: false
local_fs: false
nfs_mounts:
- dst: //vol/spool
  src: //vol/spool
slurm: true
slurm_conf:
  db: slurm
  db_password: changeme
  db_user: slurm
  elastic_scheduling:
    ResumeTimeout: 900
    SuspendTime: 3600
    TreeWidth: 128
  munge_key: JMqd42ofbcoi9o2G8fDZfNT7F9TcN6sn
ssh_user: ubuntu
use_master_as_compute: true
wait_for_services:
- de.NBI_Bielefeld_environment.service
zabbix_conf:
  admin_password: bibigrid
  db: zabbix
  db_password: zabbix
  db_user: zabbix
  server_name: bibigrid
  timezone: Europe/Berlin
XaverStiensmeier commented 1 year ago

This sounds like everything is working as expected - even though I am a bit surprised about the ephemeral: 2000... Too low. EDIT: It's correct. Just mixed up the flavors. The skipped steps are steps that are only executed on workers (we just ran Ansible for the master). So, the ephemeral should be mounted to /vol/ at the master and /vol/scratch at the workers.

I can't test (& confirm) this at the moment, but I will as soon as possible.

XaverStiensmeier commented 1 year ago

Alright, I confirmed what I stated above. Most likely it is mounted to /vol/ (on the master). You can check this by executing: mount | grep "/vol" you should see something like: /dev/vdb on /vol type ext4 (rw,relatime). If you get an empty output, something went wrong. And for the workers it should return something like:

/dev/vdb on /vol/scratch type ext4 (rw,relatime) # mounted ephemeral
192.168.200.98:/vol/spool on /vol/spool type nfs4 (rw,relatime,[...]) # nfsshare from master
ChiaraVanni commented 1 year ago

Hi, Sorry for the late reply. Yes, it is mounted (on /vol in the master and /vol/scratch in the workers). I was expecting it on /vol/scratch in the master as well because this is what I have in another cloud. Thank you for the fast help!! Chiara

XaverStiensmeier commented 1 year ago

Perfect! If no other issue occurred - and I didn't miss anything - all issues are resolved, right? In that case I am going to close this issue.

ChiaraVanni commented 1 year ago

Hi, yes all issues are solved thanks!