bb-qq / aqc111

DSM driver for Aquantia AQC111U(5Gbps) based USB Ethernet adapters
424 stars 45 forks source link

DS918+ with Sabrent NT-SS5G - Random Crash, DSM unresponsive, Unsafe Shutdown #96

Open dedura opened 1 year ago

dedura commented 1 year ago

Description of the problem

Hi, Since mid of December, I am experiencing random crashes and disconnects on my DS918+ with my Sabrent NT-SS5G Adapter, using the latest driver (v. 1.3.3.0-10). The DSM itself becomes totally unresponsive, wouldn't allow me to stop/restart the driver in Package Center and after 1-2 minutes, suddenly crashes/restarts the whole NAS. The NAS informs that the system was shut down unsafely and starts Data Scrubbing once booted. This happens every 3-4 Days. Tried both the rear and front USB ports of the NAS, but the issue remained.

Description of your products

NAS: Synology DS918+ DSM: 7.1.1-42962 Update 3 Adapter: SABRENT NT-SS5G Driver: 1.3.3.0-10 DSM-7.x (reuploaded) RAM: 16GB Other USB Port used for: (UPS) CP1500EPFCLCD - Cyber Power System, Inc.

Description of your environment

Connection: From "DS918+" to PC's NIC "Marvell® AQtion AQC107 10Gb Ethernet" PC Motherboard: ASUS ROG MAXIMUS XII FORMULA Z490 PC OS: Windows 11 Pro 22H2 Ethernet Driver version: 3.1.7.0 Cable: VENTION 1m CAT 8 Ethernet Cable Connection used for: SMB, WinNUT-2.0 (UPS)

The adapter was working fine before December without any issues, could this be caused after the latest DSM Update 3? Hope you could help to fix this. Thank you!

bb-qq commented 1 year ago

Do you have any other USB devices connected, and what are the results of lsusb -a?

dedura commented 1 year ago

Hi, I am now using my previous 2.5G CLUB 3D CAC-1420 Adapter with the driver "r8152, 2.16.3-3 DSM7.x (reuploaded)", which works fine without any issues.

Only the Ethernet Adapter and the UPS are connected, nothing else. Please see below the output of lsusb:

|usb1 1d6b:0002:0404 09 2.00 480MBit/s 0mA 1IF (Linux 4.4.180+ xhc i-hcd xHCI Host Controller 0000:00:15.0) hub |1-3 0764:0501:0001 00 2.00 12MBit/s 2mA 1IF (CPS CP1500EPFCLCD CRXLW2000395) |1-4 f400:f400:0100 00 2.00 480MBit/s 200mA 1IF (Synology DiskSta tion 7F008AFA20E41640) |usb2 1d6b:0003:0404 09 3.00 5000MBit/s 0mA 1IF (Linux 4.4.180+ xhc i-hcd xHCI Host Controller 0000:00:15.0) hub |__2-2 0bda:8156:3000 00 3.20 5000MBit/s 512mA 1IF (Realtek USB 10/1 00/1G/2.5G LAN 000000001)

bb-qq commented 1 year ago

Hmmm, from the symptoms it looks like a problem with the NT-SS5G, you might want to connect it to your PC to see if there are any stability issues.

Or you could try the QNA-UC5G1T if you can return NT-SS5G. I am also using a DS918+ and this device is running stable.

dedura commented 1 year ago

Thank you, I followed your advice and ordered the QNA-UC5G1T. Will provide feedback in the next couple of days after testing.

dedura commented 1 year ago

So, I have returned the NT-SS5G and got the QNA-UC5G1T. It's running fine now for 24 hours without crashing. I will monitor this for at least a week and update you again. I have noticed that my max speed is 355-360 MB/s (SMB). If you are using a Windows PC, could you share the Network Adapter settings of your NIC in device manager? I could possibly tweak a little to get the full speed.

dedura commented 1 year ago

Providing iperf3 output: (Only getting a max of 355-360 MB/s (SMB) as mentioned above) OS: Windows 11 Pro 22H2

iperf3 -c 192.168.xx.xx -P 2 Connecting to host 192.168.xx.xx, port 5201 [ 4] local 192.168.yy.yy port 61286 connected to 192.168.xx.xx port 5201 [ 6] local 192.168.yy.yy port 61287 connected to 192.168.xx.xx port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 186 MBytes 1.56 Gbits/sec [ 6] 0.00-1.00 sec 186 MBytes 1.56 Gbits/sec [SUM] 0.00-1.00 sec 372 MBytes 3.12 Gbits/sec


[ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 6] 1.00-2.00 sec 200 MBytes 1.68 Gbits/sec [SUM] 1.00-2.00 sec 401 MBytes 3.37 Gbits/sec


[ 4] 2.00-3.00 sec 194 MBytes 1.63 Gbits/sec [ 6] 2.00-3.00 sec 190 MBytes 1.60 Gbits/sec [SUM] 2.00-3.00 sec 384 MBytes 3.22 Gbits/sec


[ 4] 3.00-4.00 sec 208 MBytes 1.74 Gbits/sec [ 6] 3.00-4.00 sec 206 MBytes 1.73 Gbits/sec [SUM] 3.00-4.00 sec 414 MBytes 3.47 Gbits/sec


[ 4] 4.00-5.00 sec 171 MBytes 1.43 Gbits/sec [ 6] 4.00-5.00 sec 170 MBytes 1.43 Gbits/sec [SUM] 4.00-5.00 sec 340 MBytes 2.86 Gbits/sec


[ 4] 5.00-6.00 sec 205 MBytes 1.72 Gbits/sec [ 6] 5.00-6.00 sec 204 MBytes 1.71 Gbits/sec [SUM] 5.00-6.00 sec 409 MBytes 3.43 Gbits/sec


[ 4] 6.00-7.00 sec 195 MBytes 1.64 Gbits/sec [ 6] 6.00-7.00 sec 194 MBytes 1.63 Gbits/sec [SUM] 6.00-7.00 sec 389 MBytes 3.27 Gbits/sec


[ 4] 7.00-8.00 sec 203 MBytes 1.70 Gbits/sec [ 6] 7.00-8.00 sec 202 MBytes 1.70 Gbits/sec [SUM] 7.00-8.00 sec 406 MBytes 3.40 Gbits/sec


[ 4] 8.00-9.00 sec 194 MBytes 1.62 Gbits/sec [ 6] 8.00-9.00 sec 192 MBytes 1.61 Gbits/sec [SUM] 8.00-9.00 sec 386 MBytes 3.23 Gbits/sec


[ 4] 9.00-10.00 sec 209 MBytes 1.75 Gbits/sec [ 6] 9.00-10.00 sec 208 MBytes 1.75 Gbits/sec [SUM] 9.00-10.00 sec 417 MBytes 3.50 Gbits/sec


[ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 1.92 GBytes 1.65 Gbits/sec sender [ 4] 0.00-10.00 sec 1.92 GBytes 1.65 Gbits/sec receiver [ 6] 0.00-10.00 sec 1.91 GBytes 1.64 Gbits/sec sender [ 6] 0.00-10.00 sec 1.91 GBytes 1.64 Gbits/sec receiver [SUM] 0.00-10.00 sec 3.83 GBytes 3.29 Gbits/sec sender [SUM] 0.00-10.00 sec 3.83 GBytes 3.29 Gbits/sec receiver

dedura commented 1 year ago

Update: Since my last post, it has disconnected 4 times, I had to manually stop the driver and start again. The good news: It didn't freeze, crash or restart my NAS. Have you encountered this problem?

jaqb commented 1 year ago

I'm experiencing the same issue with my DS920+. I have also returned NT-SS5G and got QNA-UC5G1T. Then I even got the recommended SABRENT hub with power adapter, but the issue still persists. One time my NAS restarted by itself, so that was bad. But usually is just loses connection and I need to restart the driver. Most of the time I can restart the driver but sometimes it's just impossible to do this.

dedura commented 1 year ago

I have installed an older driver version "1.3.3.0-8 DSM-7.x. Working completely fine without a single crash or reboot since 25th February. See if that works for you.

jaqb commented 1 year ago

Thanks. I have downgraded to 1.3.3.0-8. I kind of know what to do to make the driver crash so I'll test it out.

jaqb commented 1 year ago

Nope, already had 2 improper shutdowns. Downgrading does not fix the issue for me.

dedura commented 1 year ago

Same here, just crashed the whole system, rebooted and started Data Scrubbing. I went back to the 2.5G Adapter now.

dedura commented 1 year ago

OK, now my 2.5G Adapter crashes too with the latest "r8152" driver. As I mentioned in my initial post, I believe something got messed up after the DSM (3) update.

jaqb commented 1 year ago

I would love to hear from @bb-qq regarding this issue ? Is there a way I can help to pinpoint the problem ?

bb-qq commented 1 year ago

I am wondering how much traffic is flowing through the adapter before it becomes unstable. Heat might be causing the problem.

If you plugged that adapter into a Windows PC and kept the same amount of traffic flowing through it, would it work stably for an extended period of time?

bb-qq commented 1 year ago

I am also curious as to how much memory you have in your NAS.

The versions of the driver discussed in this thread include changes in kernel parameters related to memory, so it is possible that those changes are causing the problem.

dedura commented 1 year ago

I got 16GB Memory installed (2x 8GB) from Crucial. Traffic does not seem to be an issue for me as the driver randomly crashes even when transferring some photos or multiple documents. Another scenario, when I open Surveillance Station on my PC or backup using Synology Drive, then the driver randomly crashes too. I have tried the adapter on Windows 10 & 11 and copied multiple GB files without any issues, didn't crash.

The changes in Kernel Parameters could be true as the issue started with the DSM Update 3. Is there a fix for it?

jaqb commented 1 year ago

I've got 20GB of RAM (4+16). I also don't think it's about the amount of traffic and temperature but I can't be 100% sure. For me crashes happen when I do something with webdav and plex. Like streaming from webdav server. But sometimes also just refreshing the metadata on plex. The only thing I can say about the temperature is that one time when it crashed I have touched the casing of QNA-UC5G1T and it was just barely warm. Is there a way to check the internal temperature of QNA-UC5G1T ? I do have both "Low Power 5G" and "Thermal throttling" set to ON to make sure the temperature is in check.

bb-qq commented 1 year ago

The changes in Kernel Parameters could be true as the issue started with the DSM Update 3. Is there a fix for it?

I was mentioning the changes on the driver's side. (https://github.com/bb-qq/aqc111/issues/96#issuecomment-1461841186) I don't know the details of the changes on the DSM side.

Is there a way to check the internal temperature of QNA-UC5G1T ?

As far as I know, there is no way to know the internal temperature. The only measure I can think of is to place it in a well-ventilated area and see the difference. (I saw a post once that said removing the case and installing a fan stabilized it, but I think it would be risky to go that far.)

bb-qq commented 1 year ago

I don't have any ideas to investigate the cause, but since your NAS seems to have much memory, could you try doubling the value of target_value with the /var/packages/aqc111/scripts/apply-memory-setting, although it is unlikely to improve the situation?

dedura commented 1 year ago

Thanks for your reply @bb-qq I have now doubled the target value and restarted the NAS. Will test it out and provide feedback.

`root@:/var/packages/aqc111/scripts# cat apply-memory-setting

!/bin/sh

set -eu

target_value=524288 current_value=sysctl -n vm.min_free_kbytes if [ "${current_value}" -lt "${target_value}" ] then sysctl -w vm.min_free_kbytes=${target_value} fi root@:/var/packages/aqc111/scripts# vim apply-memory-setting root@:/var/packages/aqc111/scripts# cat apply-memory-setting

!/bin/sh

set -eu

target_value=1048576 current_value=sysctl -n vm.min_free_kbytes if [ "${current_value}" -lt "${target_value}" ] then sysctl -w vm.min_free_kbytes=${target_value} fi root@:/var/packages/aqc111/scripts#`

dedura commented 1 year ago

Hi @bb-qq - Whole NAS crashed in the morning. I turned the PC on and opened a file (Excel spreadsheet) via SMB, the adapter itself was cold, not even slightly warm and it crashed the whole NAS and rebooted. Upon boot, it started data scrubbing on the volume. Also want to mention, I ran the Memory Test via Synology Assistant last night and it passed without any errors. No idea what else I can do to troubleshoot.

Since you have the same Synology model, have you not encountered any of these issues yourself? Do you mind me asking what your specs are, i.e. Memory (Official/Unofficial), DSM version, NIC on the PC and the driver version of that. Not sure if my PC's NIC driver is probably causing these crashes. I am using the latest driver from Marvell (v3.1.7.0)

bb-qq commented 1 year ago

Since you have the same Synology model, have you not encountered any of these issues yourself?

I have experienced a few times a year when I did not have low power mode enabled on a device that the device would stop responding and I would have to reload the driver. However, I have never experienced a NAS crash.

Do you mind me asking what your specs are, i.e. Memory (Official/Unofficial), DSM version, NIC on the PC and the driver version of that.

My environment is as follows:

Handle 0x0023, DMI type 16, 23 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: None Maximum Capacity: 16 GB Error Information Handle: No Error Number Of Devices: 2

Handle 0x0024, DMI type 17, 40 bytes Memory Device Array Handle: 0x0023 Error Information Handle: No Error Total Width: 8 bits Data Width: 8 bits Size: 8192 MB Form Factor: SODIMM Set: None Locator: ChannelA-DIMM0 Bank Locator: BANK 0 Type: DDR3 Type Detail: Synchronous Speed: 1600 MT/s Manufacturer: Samsung Serial Number: 35701618 Asset Tag: 9876543210 Part Number: M471B1G73BH0-YK0 Rank: Unknown Configured Memory Speed: 1600 MT/s Minimum Voltage: Unknown Maximum Voltage: Unknown Configured Voltage: Unknown

Handle 0x0025, DMI type 17, 40 bytes Memory Device Array Handle: 0x0023 Error Information Handle: No Error Total Width: 8 bits Data Width: 8 bits Size: 8192 MB Form Factor: SODIMM Set: None Locator: ChannelB-DIMM0 Bank Locator: BANK 1 Type: DDR3 Type Detail: Synchronous Speed: 1600 MT/s Manufacturer: Samsung Serial Number: 35701618 Asset Tag: 9876543210 Part Number: M471B1G73BH0-YK0 Rank: Unknown Configured Memory Speed: 1600 MT/s Minimum Voltage: Unknown Maximum Voltage: Unknown Configured Voltage: Unknown


* DSM version: 7.1.1-42962 Update 4

$ cat /etc/VERSION majorversion="7" minorversion="1" major="7" minor="1" micro="1" productversion="7.1.1" buildphase="GM" buildnumber="42962" smallfixnumber="4" nano="4" base="42962" builddate="2023/02/01" buildtime="20:01:57"



* QNA-UC5G1T FW version: 3.1.6 (latest FW on the [QNAP website](https://www.qnap.com/en-us/download?model=qna-uc5g1t&category=firmware))
* Connected USB port: front port with a stock cable
* PC NIC: AQN-107 (direct connection)
* PC NIC Driver: 2.2.3.0
dedura commented 1 year ago

Thank you - the specs look nearly identical to mine. The last option I could try is to update to the DSM 7.2 BETA version and see if that makes any difference. It would be great if you can provide an updated driver that will work with the 7.2 Beta. Thanks

bb-qq commented 1 year ago

I created drivers for the DSM 7.2 BETA, but I think it is unlikely that the DSM update will improve symptoms. https://github.com/bb-qq/aqc111/releases/tag/1.3.3.0-11

I wish I could at least find the cause of the reboot....

dedura commented 1 year ago

Thank you @bb-qq , appreciated. I have also ordered 2x 4GB Memory, which is the maximum supported Memory as per Intel's website for the INTEL Celeron J3455. Some users claim it won't utilise anything above 8GB or if it tries, the system crashes, so let me find out if this makes any difference. If you require any system outputs/logs from me, please let me know.

jaqb commented 1 year ago

bb-qq already said he also has 2x8GB so I don't think that's it. I'm currently testing something and it's looking good. I'm going to stay with 1.3.3.0-10 while I test my thing. Btw how full is your system partition ( /dev/md0) ? df -h

dedura commented 1 year ago

@jaqb - Here you go. Looking forward to hearing about your test results. Does this look right?

root@:~# df -h /dev/md0 Filesystem Size Used Avail Use% Mounted on /dev/md0 2.3G 1.9G 365M 84% /

dedura commented 1 year ago

@jaqb - Just wondering, do you use your M.2 SSD as Cache or Volume? I had mine set up as volume for over a year and the aqc111 driver was installed on that volume (volume2) - Upon checking the log files (/var/log/messages), I found quite a few error messages related to volume2.

synostgvolume[840]: fs_btrfs_metadata_usage_query.c:137 Failed to check the btrfs metadata usage of volume [/volume2].

The above message is repeated multiple times. I have now removed volume2 and using it as a normal cache now. Also replaced the 16GB RAM with 2x 4GB. So far it runs stable, even booting/restarting the NAS is much faster than before. Will test and provide feedback.

jaqb commented 1 year ago

84% used seems about right. I have now 82% but I had 100% couple of days ago so I had a lot of weird issues. Had to delete a bunch of logs to get this low.

I use 2x m.2 ssd's as cache for read-write.

jaqb commented 1 year ago

@dedura So I don't have any crashes anymore. At first I lowered the MTU (Jumbo frame) to 5000 on Synology. This fixed the driver crashes for me but the speed to my pc was worse than before. Then I noticed that I can set my pc's network adapter's jumbo frame to 4088 bytes. So I matched that on synology too and now I get good transfer speed. (Synology is set to 4000) image

@bb-qq Is you pc and synology both set to MTU 9000 and you don't experience any get driver crashes ?

dedura commented 1 year ago

@jaqb - No crashes or freezing for me since 2 weeks after replacing the RAM with 2x 4GB, even though the 16GB passed the memory test. MTU on PC (9014) > Synology (9000) > Stable, no issues. It looks like our resolutions are entirely different, but happy it works now.

bb-qq commented 1 year ago

I also have the MTU set to 9000 on my PC and NAS and have never experienced a crash.

jaqb commented 1 year ago

I have found a way to crash the driver.

I have mounted a folder using NFS on windows. Then tried playing 2 4K movie/show remuxes (==high bitrate) using mpv for at the same time and then just started seeking forward through the video. This crashed aqc111 driver every time for me.

Hopefully you will be able to reproduce this.

jaqb commented 9 months ago

@bb-qq Have you tried reproducing this issue ?

bb-qq commented 9 months ago

I didn't know how to handle NFS on Windows, so I mounted it with CIFS (SMB) and loaded it, but the symptoms did not reproduce. The connection is retained.

I also ran iperf and CrystalDiskmark under load for an extended period of time and could not reproduce the problem.

One time my NAS restarted by itself, so that was bad. But usually is just loses connection and I need to restart the driver. Most of the time I can restart the driver but sometimes it's just impossible to do this.

I am concerned about this symptom. While problems with driver instability are often reported depending on the environment, reports of the NAS itself crashing are rare. As the posts in this thread indicate, it was usually due to hardware related issues such as RAM or SSD.

jaqb commented 9 months ago

@bb-qq To enable NFS on Windows you just go to "Turn Windows features on or off" in Windows settings and check "Services for NFS". Then you enable NFS for shared folder on Synology (for your Window pc) and access the NFS share on Windows by using the full path. e.g. \\SYNOLOGYNAS\volume1\sharedfolder I think for you to replicate this is the key to resolving the crash issues in your awesome driver. Hopefully you will be able to replicate it now.

bb-qq commented 2 months ago

I have tried applying the load both the way you taught me and using WSL2, but the problem did not reproduce in my environment.

I still think it has to do with the hardware problem as described in one of the previous comments. If removing the SSD or replacing it with the factory-installed RAM causes the same problem, I don't know what to do...