Closed gmarchand closed 1 month ago
From the system logs, it appears that you are using a 2.10 Lustre filesystem:
[ 22.223813] LustreError: 16a-d: Server MGS version (2.10.5.0) refused connection from this client with an incompatible version (2.15.3_114_gb61b66c_dirty). Client must be recompiled
The client included in the Amazon Linux 2023 kernel is version 2.15. The client included in AL2 is version 2.12. By default, the 2.15 client will not connect to 2.10 filesystems. It's recommended that you use a 2.12 or newer filesystem - or use a client older than 2.15 with your preexisting filesystem.
This doc explains more about Lustre client/server compatibility: https://docs.aws.amazon.com/fsx/latest/LustreGuide/lustre-client-matrix.html
@tim-day-387 even with 2.15 filesystem the AL2023 client won't connect:
[Fri May 31 00:54:28 2024] LustreError: 26110:0:(mgc_request.c:252:do_config_log_add()) MGC172.31.84.120@tcp: failed processing log, type 1: rc = -5
[Fri May 31 00:54:38 2024] LustreError: 26155:0:(mgc_request.c:612:do_requeue()) failed processing log: -5
[Fri May 31 00:55:00 2024] LustreError: 15c-8: MGC172.31.84.120@tcp: Confguration from log jxlwlxxv-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info
[Fri May 31 00:55:00 2024] Lustre: Unmounted jxlwlxxv-client
[Fri May 31 00:55:11 2024] LustreError: 26110:0:(super25.c:187:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -5
Client module info:
[ec2-user]~$ modinfo lustre
filename: /lib/modules/6.1.91-99.172.amzn2023.x86_64/kernel/drivers/staging/lustrefsx/lustre/llite/lustre.ko
license: GPL
version: 2.15.3_114_gb61b66c_dirty
description: Lustre Client File System
author: OpenSFS, Inc. <http://www.lustre.org/>
alias: fs-lustre
srcversion: EAFEFA74278150D832AF4C5
depends: obdclass,ptlrpc,libcfs,lnet,lov,mdc,lmv
staging: Y
retpoline: Y
intree: Y
name: lustre
vermagic: 6.1.91-99.172.amzn2023.x86_64 SMP preempt mod_unload modversions
sig_id: PKCS#7
signer: Amazon Linux Kernel Signing Key
The FSx summary reports Lustre version 2.15
so it's matching the client version.
After launching another test instance including the default
security group mounting the FSx share works:
[Fri May 31 01:48:00 2024] libcfs: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:00 2024] LNet: HW NUMA nodes: 1, HW CPU cores: 1, npartitions: 1
[Fri May 31 01:48:00 2024] alg: No test for adler32 (adler32-zlib)
[Fri May 31 01:48:01 2024] Key type ._llcrypt registered
[Fri May 31 01:48:01 2024] Key type .llcrypt registered
[Fri May 31 01:48:01 2024] lnet: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] obdclass: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] Lustre: Lustre: Build Version: 2.15.3_114_gb61b66c_dirty
[Fri May 31 01:48:01 2024] ptlrpc: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] ksocklnd: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] LNet: Added LNI 172.91.17.8@tcp [8/256/0/180]
[Fri May 31 01:48:01 2024] LNet: Accept secure, port 988
[Fri May 31 01:48:01 2024] osc: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] fld: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] lov: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] fid: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] mdc: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] lmv: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] lustre: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] mgc: module is from the staging directory, the quality is unknown, you have been warned.
[Fri May 31 01:48:01 2024] Lustre: jxlwlxxv: nosquash_nids is cleared
[Fri May 31 01:48:01 2024] Lustre: jxlwlxxv: root_squash is set to 0:0
[Fri May 31 01:48:02 2024] Lustre: Mounted jxlwlxxv-client
Lustre module version 2.15.3_114_gb61b66c_dirty
used for testing.
@gmarchand it's a filesystem access issue!
Looks like we can resolve this - feel free to reopen/comment if I'm wrong.
Describe the bug
I follow the documentation to install Lustre Client on AL2023 https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html
My instance configuration is:
Here is my user data:
Here is the system log with the error
Moved to AL2
When I only change the AMI from AL2023 to AL2
it works