alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
57 stars 15 forks source link

Update mount points #2092

Closed JimMadge closed 1 week ago

JimMadge commented 1 month ago

:white_check_mark: Checklist

:vertical_traffic_light: Depends on

:arrow_heading_up: Summary

:closed_umbrella: Related issues

Closes #2027

:microscope: Tests

github-actions[bot] commented 1 month ago

Coverage report

This PR does not seem to contain any modification to coverable code.

craddm commented 1 month ago

This is what I'm seeing in the serial console after trying to deploy a new vm in an existing SRE

[  165.116160] cloud-init[1938]: >=== Mounting all external volumes... ===<
[  165.121676] cloud-init[1938]:   /etc/fstab  # CLOUD_IMG: This file was created/modified by the Cloud Image build process
[  165.127771] cloud-init[1938]:   /etc/fstab  UUID=03496b49-bced-4cc6-8502-efd2ae48dbb9        /        ext4   discard,errors=remount-ro       0 1
[  165.133341] cloud-init[1938]:   /etc/fstab  UUID=F229-A1AA   /boot/efi       vfat    umask=0077      0 1
[  165.139420] cloud-init[1938]:   /etc/fstab  shgresremoodesiredstatec.blob.core.windows.net:/shgresremoodesiredstatec/desiredstate    /var/local/ansible      nfs     ro,_netdev,sec=sys,vers=3,nolock,proto=tcp,comment=cloudconfig  0       2
[  165.145367] cloud-init[1938]:   /etc/fstab  shgresremoosensitivedata.blob.core.windows.net:/shgresremoosensitivedata/ingress /mnt/input      nfs     ro,_netdev,sec=sys,vers=3,nolock,proto=tcp,comment=cloudconfig  0       2
[  165.151276] cloud-init[1938]:   /etc/fstab  shgresremoosensitivedata.blob.core.windows.net:/shgresremoosensitivedata/egress  /mnt/output     nfs     rw,_netdev,sec=sys,vers=3,nolock,proto=tcp,comment=cloudconfig  0       2
[  165.156950] cloud-init[1938]:   /etc/fstab  shmgreensremoocouserdata.file.core.windows.net:/shmgreensremoocouserdata/shared  /mnt/shared     nfs     _netdev,sec=sys,nconnect=4,comment=cloudconfig  0       2
[  165.163467] cloud-init[1938]:   /etc/fstab  shmgreensremoocouserdata.file.core.windows.net:/shmgreensremoocouserdata/home    /home   nfs     _netdev,sec=sys,nconnect=4,comment=cloudconfig  0       2
[  165.169982] cloud-init[1938]:   /etc/fstab  /dev/disk/cloud/azure_resource-part1     /mnt    auto    defaults,nofail,x-systemd.requires=cloud-init.service,_netdev,comment=cloudconfig       0       2
[  165.216682] cloud-init[1938]: mount.nfs: trying 10.0.1.28 prog 100003 vers 3 prot TCP port 2048
[  165.255024] cloud-init[1938]: mount.nfs: trying 10.0.1.28 prog 100005 vers 3 prot TCP port 2048
[  165.269955] cloud-init[1938]: mount.nfs: timeout set for Tue Aug  6 10:15:40 2024
[  165.278388] cloud-init[1938]: mount.nfs: trying text-based options 'sec=sys,vers=3,nolock,proto=tcp,addr=10.0.1.28'
[  165.283072] cloud-init[1938]: mount.nfs: prog 100003, trying vers=3, prot=6
[  165.289101] cloud-init[1938]: mount.nfs: prog 100005, trying vers=3, prot=6
[  165.298038] cloud-init[1938]: mount.nfs: mount point /mnt/input does not exist
[  165.306742] cloud-init[1938]: mount.nfs: mount point /mnt/output does not exist
[  165.318891] cloud-init[1938]: mount.nfs: mount point /mnt/shared does not exist
[  165.352792] cloud-init[1938]: mount.nfs: timeout set for Tue Aug  6 10:15:41 2024
[  165.362954] cloud-init[1938]: mount.nfs: trying text-based options 'sec=sys,nconnect=4,vers=4.2,addr=10.0.1.36,clientaddr=10.0.2.4'
jemrobinson commented 1 month ago

The symlinks aren't being uploaded as they're actually local symlinks on the deployment system:

LOCAL

$ ls -alh /Users/jrobinson/Developer/data-safe-haven/code/dsh-upstream/data_safe_haven/resources/workspace/ansible/files/etc/skel/input
lrwxr-xr-x  1 jrobinson  staff    10B  6 Aug 13:40 /Users/jrobinson/Developer/data-safe-haven/code/dsh-upstream/data_safe_haven/resources/workspace/ansible/files/etc/skel/input -> /mnt/input

WORKSPACE

$ ls -alh /var/local/ansible/files/etc/skel
total 2.0K
dr-xr-xr-x 2 root root    0 Aug  6 10:50 .
dr-xr-xr-x 2 root root    0 Aug  6 10:51 ..
-r-xr-xr-x 1 root root 1.3K Aug  6 10:50 bashrc
-r-xr-xr-x 1 root root   14 Aug  6 10:50 xsession
JimMadge commented 1 month ago

OK, we can create them another way.

JimMadge commented 1 month ago

@jemrobinson Changed in db95c0f

jemrobinson commented 1 month ago

~This also fails although I'm not sure why:~ EDIT: I think we need /etc instead of etc in the symlinks.

TASK [Create skeleton symlinks] ************************************************
failed: [localhost] (item={'path': 'etc/skel/input', 'src': '/mnt/input'}) => {"ansible_loop_var": "item", "changed": false, "item": {"path": "etc/skel/input", "src": "/mnt/input"}, "msg": "Error while linking: [Errno 2] No such file or directory: b'/mnt/input' -> b'etc/skel/input'", "path": "etc/skel/input"}
failed: [localhost] (item={'path': 'etc/skel/output', 'src': '/mnt/output'}) => {"ansible_loop_var": "item", "changed": false, "item": {"path": "etc/skel/output", "src": "/mnt/output"}, "msg": "Error while linking: [Errno 2] No such file or directory: b'/mnt/output' -> b'etc/skel/output'", "path": "etc/skel/output"}
failed: [localhost] (item={'path': 'etc/skel/shared', 'src': '/mnt/shared'}) => {"ansible_loop_var": "item", "changed": false, "item": {"path": "etc/skel/shared", "src": "/mnt/shared"}, "msg": "Error while linking: [Errno 2] No such file or directory: b'/mnt/shared' -> b'etc/skel/shared'", "path": "etc/skel/shared"}

since the targets do exist

$ ls -alh /mnt/
total 8.5K
drwxr-xr-x  5 root root 4.0K Aug  6 12:45 .
drwxr-xr-x 19 root root 4.0K Aug  6 12:56 ..
drwxr-xr-x  2 root root    0 Aug  6 10:58 input
drwxrwxrwx  2 root root    0 Aug  6 10:58 output
drwxrwxrwx  2 root root   64 Aug  6 10:58 shared
JimMadge commented 1 month ago

@jemrobinson Paths should be correct in bb47298

JimMadge commented 1 month ago

@jemrobinson Interesting, that looks very similar to the problem I had earlier?

I'm not sure why that would be, do you have any ideas?

JimMadge commented 1 month ago

@jemrobinson @craddm Let's bump this one until after the pen test?

jemrobinson commented 1 month ago

It's a missing setting in /etc/nsswitch.conf. Any idea why this might have stopped being set?

WORKING

$ cat /etc/nsswitch.conf
passwd:         files systemd ldap
group:          files systemd ldap
shadow:         files ldap
gshadow:        files

hosts:          files mdns4_minimal [NOTFOUND=return] dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

NON-WORKING

$ cat /etc/nsswitch.conf
passwd:         files systemd
group:          files systemd
shadow:         files
gshadow:        files

hosts:          files mdns4_minimal [NOTFOUND=return] dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

i.e. it's the missing ldap setting on passwd and group

JimMadge commented 1 month ago

That is the next task

https://github.com/alan-turing-institute/data-safe-haven/blob/bb4729831d238a3ca88346afdca46c685aab6172/data_safe_haven/resources/workspace/ansible/desired_state.yaml#L81-L85

Did that task run?

jemrobinson commented 1 month ago

Did that task run?

No, because the symlink failed with exception occurred during task execution.

JimMadge commented 1 month ago

In that case, it might just require running the playbook again (assuming the fatal error is fixed).

jemrobinson commented 1 month ago

/mnt/shared still failing:

ok: [localhost] => (item={'path': '/etc/skel/input', 'src': '/mnt/input'})
ok: [localhost] => (item={'path': '/etc/skel/output', 'src': '/mnt/output'})
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: PermissionError: [Errno 1] Operation not permitted: b'/mnt/shared'
failed: [localhost] (item={'path': '/etc/skel/shared', 'src': '/mnt/shared'}) => {"ansible_loop_var": "item", "changed": false, "item": {"path": "/etc/skel/shared", "src": "/mnt/shared"}, "module_stderr": "Traceback (most recent call last):\n  File \"/root/.ansible/tmp/ansible-tmp-1722952769.3476353-37900-21432236129960/AnsiballZ_file.py\", line 102, in <module>\n    _ansiballz_main()\n  File \"/root/.ansible/tmp/ansible-tmp-1722952769.3476353-37900-21432236129960/AnsiballZ_file.py\", line 94, in _ansiballz_main\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n  File \"/root/.ansible/tmp/ansible-tmp-1722952769.3476353-37900-21432236129960/AnsiballZ_file.py\", line 40, in invoke_module\n    runpy.run_module(mod_name='ansible.modules.file', init_globals=None, run_name='__main__', alter_sys=True)\n  File \"/usr/lib/python3.10/runpy.py\", line 224, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib/python3.10/runpy.py\", line 96, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_ansible.builtin.file_payload_apoyxk31/ansible_ansible.builtin.file_payload.zip/ansible/modules/file.py\", line 928, in <module>\n  File \"/tmp/ansible_ansible.builtin.file_payload_apoyxk31/ansible_ansible.builtin.file_payload.zip/ansible/modules/file.py\", line 916, in main\n  File \"/tmp/ansible_ansible.builtin.file_payload_apoyxk31/ansible_ansible.builtin.file_payload.zip/ansible/modules/file.py\", line 771, in ensure_symlink\n  File \"/tmp/ansible_ansible.builtin.file_payload_apoyxk31/ansible_ansible.builtin.file_payload.zip/ansible/module_utils/basic.py\", line 1422, in set_fs_attributes_if_different\n  File \"/tmp/ansible_ansible.builtin.file_payload_apoyxk31/ansible_ansible.builtin.file_payload.zip/ansible/module_utils/basic.py\", line 1186, in set_mode_if_different\nPermissionError: [Errno 1] Operation not permitted: b'/mnt/shared'\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

Can we do this with a script in /etc/profile instead?

JimMadge commented 1 month ago

Strange :eyes:.

The downside of /etc/profile would be it will run on every login shell (is that the right one?). It shouldn't take too long.

That said, if root can't create a symlink, I would guess the same would happen for normal users. Would be worth seeing if we can understand why that failed.

JimMadge commented 1 month ago

Would be good to know,

jemrobinson commented 1 month ago

BUG:

Output folder is not writeable

$ touch /mnt/output/test.txt
touch: cannot touch 'test.txt': Permission denied
jemrobinson commented 1 month ago

If we merge #2103 then we probably don't need the symlinks. If we can drop the symlinks entirely and fix the output folder issue, then we could consider merging this.

jemrobinson commented 3 weeks ago

@JimMadge worth coming back to this one after RSECon?

JimMadge commented 1 week ago

All mount point as in https://github.com/alan-turing-institute/data-safe-haven/issues/2027#issuecomment-2269291595

Deployment runs without error and the system is functional.

JimMadge commented 1 week ago

@JimMadge this looks fine, but can you confirm that it works (and all folders mount correctly) in a from-scratch deployment? I'm not sure why it wasn't working before, so I'd like to be sure.

Yes all was working from a fresh deployment.