Azure / azhpc-images

Azure HPC/AI VM Images
MIT License
95 stars 78 forks source link

Ubuntu-hpc 18.04 latest image slowness/lag reported on NFS File share #217

Closed prasannanayak21 closed 9 months ago

prasannanayak21 commented 1 year ago

Hello,

There seems to be some lag/slowness in accessing the NFS file system on the latest Ubuntu-hpc 18.04 image which was released to the Azure marketplace on 15/03/2023.

Image URN: microsoft-dsvm:ubuntu-hpc:1804:18.04.2023031501 Kernel version:

unaazureuser@ip-0A0A000D:~$ uname -r
5.4.0-1104-azure

NFS File system Details:

azureuser@ip-0A0A000B:~$ df -h | grep /shared    --> Azure File Share
premstortest23.file.core.windows.net:/premstortest23/test  1.0T  1.5G 1023G   1% /shared

azureuser@ip-0A0A000B:~$ df -h | grep /data   --> ANF 
10.10.2.4:/testvol                                         100G  256K  100G   1% /data

File Access:

azureuser@ip-0A0A000D:~$ time ls /shared/home/azureuser/
a  foo  new.txt  sample.txt

real    0m0.635s
user    0m0.004s
sys     0m0.001s

azureuser@ip-0A0A000D:~$ time ls /data/
a.txt  b.txt

real    0m0.483s
user    0m0.000s
sys     0m0.003s

azureuser@ip-0A0A000D:~$ time touch /data/c.txt

real    0m0.186s
user    0m0.002s
sys     0m0.000s

azureuser@ip-0A0A000D:~$ time mkdir /shared/home/azureuser/test

real    0m0.614s
user    0m0.001s
sys     0m0.003s

While on older Ubuntu-DSVM marketplace images, there is no lag observed. This is causing slowness in customer application running.

Old Ubuntu-hpc Image and Kernel version:

microsoft-dsvm:ubuntu-hpc:1804:18.04.2022121201
azureuser@ip-0A0A000B:~$ uname -r
5.4.0-1098-azure
azureuser@ip-0A0A000B:~$ time ls /shared/home/azureuser/
a  foo  new.txt  sample.txt  test

real    0m0.008s
user    0m0.001s
sys     0m0.000s
azureuser@ip-0A0A000B:~$ time ls /data/
a.txt  b.txt  c.txt

real    0m0.003s
user    0m0.001s
sys     0m0.000s

Please check on this to fix the kernel issue.

abhamidipati-msft commented 1 year ago

@prasannanayak21 Can you please compare nfs-read-ahead limits between the images? - https://learn.microsoft.com/en-us/azure/azure-netapp-files/performance-linux-nfs-read-ahead

prasannanayak21 commented 1 year ago

@abhamidipati0614 Image URN: microsoft-dsvm:ubuntu-hpc:1804:18.04.2023031501 (Issue one)

azureuser@ip-0A0A000C:~$ uname -r
5.4.0-1104-azure
azureuser@ip-0A0A000C:~$ ./readahead.sh show /data
/data 0:55 /sys/class/bdi/0:55/read_ahead_kb = 128
azureuser@ip-0A0A000C:~$ ./readahead.sh show /shared
/shared 0:57 /sys/class/bdi/0:57/read_ahead_kb = 128

Image URN: microsoft-dsvm:ubuntu-hpc:1804:18.04.2022121201 (Working one)

azureuser@ip-0A0A000B:~$ uname -r
5.4.0-1098-azure
azureuser@ip-0A0A000B:~$ ./readahead.sh show /shared
/shared 0:56 /sys/class/bdi/0:56/read_ahead_kb = 128
azureuser@ip-0A0A000B:~$ ./readahead.sh show /data
/data 0:55 /sys/class/bdi/0:55/read_ahead_kb = 128

Let me know if I need to share any other details.

blepore commented 10 months ago

Just wanted to raise this here as well. We've found that the readahead setting degrades performance for Lustre clients. https://github.com/Azure/azhpc-images/issues/293

abhamidipati-msft commented 9 months ago

PR https://github.com/Azure/azhpc-images/pull/294 should resolve this and the fix is available in the following images microsoft-dsvm:ubuntu-hpc:2004:20.04.2023111801 microsoft-dsvm:ubuntu-hpc:2204:22.04.2023111801