Closed craddm closed 1 month ago
I saw the same problem when we first tested #2092. @JimMadge : can you take a look?
Same missing setting you report there - ldap missing from group and passwd
Looks like the problem was you already have a disk mounted at /mnt
. What size are you using?
@JimMadge : Is something automatically mounted at /mnt
? Possibly the (badly documented) temp disk that lives on the same physical machine as the VM?
This might also explain why mounting at /shared
was fine but /mnt/shared
causes a problem.
Standard_D2s_v3
btw I literally just got it running by creating the /mnt
subdirectories from the console. Maybe adding the -m
flag to mount -fav
, so it creates the directories if they don't exist, will fix it?
Hmm, I'm not sure. That size shouldn't have a temporary disk.
You can see in the fstab output though that there is device /dev/disk/cloud/azure_resource-part-1
at /mnt
. That doesn't look right to me.
I haven't seen this problem in any deployments I've done recently.
Is it worth abandoning /mnt
and putting our drives somewhere else?
Or alternatively, explicitly adding something like the following to /etc/fstab
/dev/disk/cloud/azure_resource-part1 /mnt/tmp auto defaults,nofail,_netdev 0 2
I'd rather understand what is happening here. It isn't a mount that we have defined and I'm not sure what it is.
If it is something Azure is adding we could make sure to remove entries like that from fstab (if that is the problem here).
This end of this section says "For Linux VMs, the temporary disk is /dev/sdb1
and is mounted at /mnt/resource
or /mnt
.", so I'm not sure we can disable this.
According to this outdated answer it might be possible to change in waagent.conf
. However, I'm not confident that we could use cloud-init
to change that file since waagent is used to run cloud-init.
Otherwise, we should see whether adding an explicit mount point for /dev/disk/cloud/azure_resource-part1
will fix it (as above).
Oh wait, the Dvs3 series do have temporary disks.
Is there a reason to use such an old offering?
I think we should be robust against the use of VM SKUs that have temporary disks (which is most of them) regardless of whether Dvs3 is a sensible SKU to use.
It's just the one that was our default on the old codebase, so I still use it as a default out of habit, but as @jemrobinson, should be robust to stuff like this IMO. (I think the GPU VMs we currently recommend -e.g. Stanard_NC6s_v3
also have local temp disks?)
(which is most of them)
I'm not sure if that is true, it tends to be the older offerings. Looks like the high performance sizes and those with accelerators have local disks though, sometimes NVMe.
From here
Most VMs contain a temporary disk, which is not a managed disk.
I'm sure it has been true but the trend in the current and new general purpose sizes is to not include local storage and provide 'd' series variants for those that want it. Noting that line has been in the docs for quite a few years. That said, I don't think I want to enumerate all the available sizes or sizes*availability 😅.
It does seem common on sizes with accelerators though. That makes sense as the users would likely want fast storage, and the physical nodes in the data centre are more likely to have onboard storage.
We'll need to fix this to enable GPU/FPGA sizes.
Adding -o X-mount.mkdir
fixed this on the Standard_D2s_v3
at least, although I'm still unable to login
e.g. while (! mountpoint -q /mnt/input); do sleep 5; mount -o X-mount.mkdir /mnt/input; done
What does that option do?
I think I'd rather just not mount local storage at /mnt
so that we are consistent with machine with and without local storage.
It creates the directory if it doesn't exist
Alias --mkdir
might be more clear?
Alias
--mkdir
might be more clear?
it would be, but for some reason that's not a supported option on the machine I tested
This doesn't really deal with the question of what happens in these scenarios:
/mnt
/mnt/ingress
OR
/mnt/ingress
/mnt
In either case, we'd lose access to /mnt/ingress
. I think it's safer/better to overwrite where the temp disk mounts or to move our mounts outside /mnt
.
I think the best solution would be that we standardise where our mounts and temp disk(s) are in all cases.
I like having our mounts at /mnt
, it feels idiomatic. I would put local disks at something like /scratch
, /var/scratch
, /mnt/scratch
.
We might need some logic like "if /dev/disk/cloud/... then ..."
Agreed that /var/scratch
or /mnt/scratch
makes sense for the temp disk. Have you tried the fstab
line I put above?
This doesn't really deal with the question of what happens in these scenarios:
- temp disk mounted at
/mnt
- data mounted at
/mnt/ingress
- temp disk unmounted
OR
- data mounted at
/mnt/ingress
- temp disk mounted at
/mnt
In either case, we'd lose access to
/mnt/ingress
. I think it's safer/better to overwrite where the temp disk mounts or to move our mounts outside/mnt
.
Looks like it's possible to change the location of the temp disk in /etc/waagent.conf
:
There's a field ResourceDisk.MountPoint
which is set to /mnt
on our VMs
https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/agent-linux
We might be going in circles here, but since waagent
is used to run cloud-init
, we probably can't use a cloud-init
command to change the configuration.
Can we give arguments to waagent
when deploying the machine?
So, we can add ephemeral0
to the mounts:
section of our cloud-init
(see here.
[ephemeral0, null]
effectively doesn't mount it
[ephemeral0, /mnt/resource]
mounts it to /mnt/resource
. So we could mount it to /mnt/scratch
or whatever you'd prefer. Works whether the VM really has an ephemeral disk or not - just an empty folder when there is no real extra disk.
At this point the desktop icons for input
, output
and shared
are not working
:white_check_mark: Checklist
:computer: System information
:package: Packages
List of packages
```noen acme==2.10.0 annotated-types==0.7.0 appdirs==1.4.4 Arpeggio==2.0.2 attrs==24.2.0 azure-common==1.1.28 azure-core==1.31.0 azure-identity==1.18.0 azure-keyvault-certificates==4.8.0 azure-keyvault-keys==4.9.0 azure-keyvault-secrets==4.8.0 azure-mgmt-compute==33.0.0 azure-mgmt-containerinstance==10.1.0 azure-mgmt-core==1.4.0 azure-mgmt-dns==8.1.0 azure-mgmt-keyvault==10.3.1 azure-mgmt-msi==7.0.0 azure-mgmt-rdbms==10.1.0 azure-mgmt-resource==23.1.1 azure-mgmt-storage==21.2.1 azure-storage-blob==12.23.1 azure-storage-file-datalake==12.17.0 azure-storage-file-share==12.18.0 certifi==2024.8.30 cffi==1.17.1 charset-normalizer==3.3.2 chevron==0.14.0 click==8.1.7 cryptography==43.0.1 -e git+https://github.com/craddm/data-safe-haven.git@a6a6993ad7bbcc02e8f05629fe0fd5ab7154a900#egg=data_safe_haven debugpy==1.8.6 dill==0.3.9 dnspython==2.6.1 fqdn==1.5.1 grpcio==1.60.2 idna==3.10 isodate==0.6.1 josepy==1.14.0 markdown-it-py==3.0.0 mdurl==0.1.2 msal==1.31.0 msal-extensions==1.2.0 msrest==0.7.1 oauthlib==3.2.2 parver==0.5 portalocker==2.10.1 protobuf==4.25.5 psycopg==3.2.3 pulumi==3.134.1 pulumi_azure_native==2.63.0 pulumi_random==4.16.6 pycparser==2.22 pydantic==2.9.2 pydantic_core==2.23.4 Pygments==2.18.0 PyJWT==2.9.0 pyOpenSSL==24.2.1 pyRFC3339==1.1 pytz==2024.2 PyYAML==6.0.2 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.8.1 semver==2.13.0 setuptools==75.1.0 shellingham==1.5.4 simple_acme_dns==3.1.0 six==1.16.0 typer==0.12.5 typing_extensions==4.12.2 urllib3==2.2.3 validators==0.28.3 websocket-client==1.8.0 ```:no_entry_sign: Describe the problem
Virtual machine sizes with local storage (mounted at
/mnt
) clash with NFS mounts defined in/etc/fstab
.:deciduous_tree: Log messages
Relevant log messages
![image](https://github.com/user-attachments/assets/6a47c817-c91b-467a-959d-619a9806b662):recycle: To reproduce
Deploy a workspace using a VM size with local storage.