bb-qq / r8152

Synology DSM driver for Realtek RTL8152/RTL8153/RTL8156 based adapters
GNU General Public License v2.0
1.91k stars 178 forks source link

rtd1296 stability issue #275

Open bb-qq opened 1 year ago

bb-qq commented 1 year ago

This issue summarizes the topic of the driver not working on the rtd1296 platform.

There are many reports of unstable operation in products using rtd1296. The typical symptoms reported are as follows

There are also no reports of stable operation.

When disconnected, there seems to be something wrong at the USB level. This may indicate that the rtd1296 SoC may have some software or hardware issues with the xHCI host controller.

I am looking for a workaround for this problem, but so far have not found it. (I am also considering providing a standard usb-cdc driver separately.) I will report here if any progress is made in the investigation.

Affected Products

andrus2049 commented 1 year ago

In my case the problem occurs only when transferring large files from NAS to PC. I conversely can navigate the NAS directory structure for hours without the problem occurs.

One side note: after installation I immediately changed the MTU size to 9000 in the NAS LAN configuration. After I found the problem, I tried to reset MTU to 1500 (default), also unchecking the manual setting, but after saving this setting it still remains enabled with the value of 9000, at least as shown in the GUI. No way to reset. But it may be only a GUI issue.

Anyway, because of the evidence that only large transfers cause the problem, it might have something to do with the MTU?

bb-qq commented 1 year ago

Anyway, because of the evidence that only large transfers cause the problem, it might have something to do with the MTU?

MTU may have something to do with this stability issue, but there are reports of problems occurring even with MTU values of 1500.

It might possibly relate to the hardware-assisted functions of the transmission on the NIC side.

I would like to know if disabling those features by the following command will make a difference in stability.

ethtool -K eth2 tso off
ethtool -K eth2 gso off
ethtool -K eth2 sg off
andrus2049 commented 1 year ago
ethtool -K eth2 tso off
ethtool -K eth2 gso off
ethtool -K eth2 sg off

Thanks, going to try.

How to check which are the current values before issueing these commands?

And are these new values reversed upon NAS restart or are they persistent?

NikitaOsotsky commented 1 year ago

I tested the connection with the suggested changes. It's a pity, but nothing has changed. DS218 & rtl8156 2.5

NikitaOsotsky commented 1 year ago

I also tried ipv6 access https://[fe80::XXXX:XXXX:XXXX:XXXX]:5001/ I tried to download the file and it didn't help either

cqwangding commented 1 year ago

DS920+ 2.16.3-3 DSM7.x (reuploaded) lan rtd1296 (ks-is ks-714) https://ks-is.com/usb-3-1-ethernet-adapter-ks-is-ks-714?tag=2.5G

There are no problems with data transfer. Especially for a couple of hours I drove chia 100gb plots at a speed of 2.5. But there is another problem! When you pull out and put back the adapter, the driver turns off. The same goes for rebooting. Must be manually enabled in the web interface.

rtd1296 is the cpu for entry level synology, but not for DS920+.

jebug29 commented 1 year ago
ethtool -K eth2 tso off
ethtool -K eth2 gso off
ethtool -K eth2 sg off

Using a DS418 with a TRENDnet TUC-ET2G and the r8152 driver. This managed to get me 2.5Gb speeds briefly (and for longer than it would previously hold a connection at all), but the connection ultimately shut down. It does seem like I was getting 2500mbps upload and only about 1000mbps download.

dlbomber1974 commented 1 year ago

Has there been any updates for DS418 with TRENDnet 2.5G USB-C to RJ-45? I got this and thought before I looked on here. My expectation was this was going to work. Yet I am seeing the issues with the drivers above. I ran the SSH after it failed and then saw the connection under network. Connected yet it was a 169. address . I am also using a TRENDnet 5-Port Unmanaged 2.5G PoE+switch with its own AC Adapter. After a reboot its completely gone. I had to run the RT App when I rebooted as it did not auto restart.

After assigning a static IP, I am now showing: 2500mbps Full Duplex 1500 MTU

I will put it to test with a few file transfers small and large tomorrow when I get up.

bb-qq commented 1 year ago

Thanks all, it looks like disabling GSO and TSO didn't make much difference in stability.

These settings will revert after reboot. If they have any effect, please register them in the task scheduler or something so that they are configured at startup.

Dayofwonder commented 1 year ago

Same here with my 218play. I tried two different adapters with 8152 chipset (none of the recommended adapters yet). After some research, I found on the internet that the error is very common. It can be seen well in the /var/log/kern.log. Unfortunately, I could not find a solution to the problem. The error occurs with large amounts of data. The connection is interrupted for about 45 seconds, the NAS is then also not accessible via ping.

This is what the kern.log looks like: 2023-01-06T20:03:46+01:00 diskstation kernel: [470548.442279] r8152 3-1:1.0 eth1: Tx timeout 2023-01-06T20:03:46+01:00 diskstation kernel: [470548.448949] r8152 3-1:1.0 eth1: Tx status -2 2023-01-06T20:03:46+01:00 diskstation kernel: [470548.453431] r8152 3-1:1.0 eth1: Tx status -2 2023-01-06T20:03:46+01:00 diskstation kernel: [470548.457911] r8152 3-1:1.0 eth1: Tx status -2 2023-01-06T20:03:46+01:00 diskstation kernel: [470548.462397] r8152 3-1:1.0 eth1: Tx status -2 2023-01-06T20:03:48+01:00 diskstation kernel: [470550.434430] r8152 3-1:1.0 eth1: get_registers -108 2023-01-06T20:03:48+01:00 diskstation kernel: [470550.439501] r8152 3-1:1.0 eth1: get_registers -71 2023-01-06T20:03:48+01:00 diskstation kernel: [470550.444479] r8152 3-1:1.0 eth1: get_registers -71 2023-01-06T20:03:48+01:00 diskstation kernel: [470550.449441] r8152 3-1:1.0 eth1: get_registers -71 2023-01-06T20:03:48+01:00 diskstation kernel: [470550.454439] r8152 3-1:1.0 eth1: get_registers -71 2023-01-06T20:03:48+01:00 diskstation kernel: [470550.459401] r8152 3-1:1.0 eth1: get_registers -71

I operate it with 1 Gbps, not with 2,5 Gbps. So Your workaround ("May operate stably when linked at 1 Gbps") has no effect here.

Next week I will get the club 3D USB adapter with 8156 chipset. I will test it and report if the error also occurs.

As written before, there are some articles and forums about this topic, here are some of them, don't know if it could help in our environment:

https://portal.cloudunboxed.net/knowledgebase/55/How-to-fix-Realtek-USB-NIC-TX-timeout-issues.html

https://forum.odroid.com/viewtopic.php?f=212&t=45857

https://bugzilla.kernel.org/show_bug.cgi?id=198931

And by the way: I tested another adapter with 8169 chipset (together with Your 8152 driver). It worked, but with a poor performance (about 30 MB/s).

dlbomber1974 commented 1 year ago

Coming back to test my NAS DS418 with my Trendnet 2.5gbe setup I saw where my connection showed connected still but I had no ping and the port was non-responsive. I saw the mac address but no even after several reboots, uninstall, reinstall etc... I read some other places where this has occurred so I had to dust off my old Linux had and found a short remedy for this. I did notice regardless of me setting MTU to 9000 in the GUI it still is showing up as MTU 1500.

sudo /etc/rc.network restart

My connection back up , IP now showing and pingable. eth2 Link encap:Ethernet HWaddr 3C:8C:F8:60:0A:94 inet addr:192.168.1.201 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:59735 errors:0 dropped:0 overruns:0 frame:0 TX packets:39 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3237395 (3.0 MiB) TX bytes:9563 (9.3 KiB)

Now I will go forward with my testing.

Dayofwonder commented 1 year ago

Coming back to test my NAS DS418 with my Trendnet 2.5gbe setup I saw where my connection showed connected still but I had no ping and the port was non-responsive. I saw the mac address but no even after several reboots, uninstall, reinstall etc... I read some other places where this has occurred so I had to dust off my old Linux had and found a short remedy for this. I did notice regardless of me setting MTU to 9000 in the GUI it still is showing up as MTU 1500.

sudo /etc/rc.network restart

Yes, this is ONE way. For me it works to stop and restart the installed driver in the package center ... But this isn't a workaround as long as I won't be able to download any file from the NAS.

Dayofwonder commented 1 year ago

Just tested with Club 3D USB adapter. Test failed. Upload speed is a disaster (worst of all devices).

image

And downloads don't start at all. 2023-01-10T17:21:49+01:00 diskstation kernel: [806423.856275] r8152 3-1:1.0 eth1: Tx timeout 2023-01-10T17:21:49+01:00 diskstation kernel: [806423.862963] r8152 3-1:1.0 eth1: Tx status -2 2023-01-10T17:21:51+01:00 diskstation kernel: [806425.820037] r8152 3-1:1.0 eth1: get_registers -108 2023-01-10T17:21:51+01:00 diskstation kernel: [806425.825106] r8152 3-1:1.0 eth1: get_registers -71 2023-01-10T17:21:51+01:00 diskstation kernel: [806425.870979] xhci-hcd xhci-hcd.2.auto: URB transfer length is wrong, xHC issue? req. len = 4, act. len = 4294967292

So: None of my 3 different adapters do the trick.

Dayofwonder commented 1 year ago

For now I use 2 connections: LAN for download from NAS, USB for uploads. I observe quite a good performance for uploads, about 80-120 MB/s (well, of cause this is no 2,5 gps speed, but more than before and might be limited by my HDDs) with this adapter: https://www.digitec.ch/de/s1/product/digitus-usb-type-c-gigabit-ethernet-adapter-25g-usb-c-usb-a-usb3130-usb-c-usb-31-netzwerkadapter-16185124 As written before: As far as I try any download via USB, the USB connection crashes and I will have to stop and restart the driver in the package center.

Voidnickyname commented 1 year ago

Driver version 2.15.0-10 tested last night on 418j, and met with the same issue when download large file over 500mb from nas. Download failed and Nas show no response to ping. On my case, reconnect the lan wire between nas and router do fix the No-response situation, But The download issue is repeatable. I tried linking the nas and PC directly with one wire, And met with the same issue. Link the wire to the onboard 1g port of the nas, and everything is ok.

jvalenciag commented 1 year ago

I tested both adapters from CableMatters and ASUS, on DS220j and got the same behavior, uploads work as expected but downloads make the adapter hang. Tested with iperf3. Screenshot 2023-01-30 at 12 18 13 Screenshot 2023-01-30 at 12 18 22

javitoalon commented 1 year ago

Same here with DS220j. Uploading goes fine even with large files. Downloading it breaks instantaneously. Testing adapter with chipset RTL8156B.

alex-arzner-pro commented 1 year ago

The same problem on model DS220j, like many who have already written here. Works unstable. Especially if you give the load on the interface. If it helps in any way, I could help with testing and even provide access to my device, for example, through a mesh network.

Dayofwonder commented 1 year ago

Did someone follow these instructions named "How to fix Realtek USB NIC TX timeout issues"? https://portal.cloudunboxed.net/knowledgebase/55/How-to-fix-Realtek-USB-NIC-TX-timeout-issues.html

I am not THAT big linux specialist and hesitate to try this workaround ...

Romeo1984 commented 1 year ago

Did someone follow these instructions named "How to fix Realtek USB NIC TX timeout issues"? https://portal.cloudunboxed.net/knowledgebase/55/How-to-fix-Realtek-USB-NIC-TX-timeout-issues.html

I am not THAT big linux specialist and hesitate to try this workaround ...

The kernel is already disabling AutoSuspend USB Power Mode:

$ cat /sys/module/usbcore/parameters/autosuspend -1

dlbomber1974 commented 1 year ago

I can give it a whirl this afternoon. I am pretty savvy with Linux. I’ll post feedback.

On Mon, Feb 27, 2023 at 2:03 PM Romeo1984 @.***> wrote:

Did someone follow these instructions named "How to fix Realtek USB NIC TX timeout issues"? https://portal.cloudunboxed.net/knowledgebase/55/How-to-fix-Realtek-USB-NIC-TX-timeout-issues.html

I am not THAT big linux specialist and hesitate to try this workaround ...

The kernel is already disabling AutoSuspend USB Power Mode:

$ cat /sys/module/usbcore/parameters/autosuspend -1

— Reply to this email directly, view it on GitHub https://github.com/bb-qq/r8152/issues/275#issuecomment-1446892503, or unsubscribe https://github.com/notifications/unsubscribe-auth/APKQWR4M5IY2O3Z4KJ35XALWZT27LANCNFSM6AAAAAASTCUFFE . You are receiving this because you commented.Message ID: @.***>

Romeo1984 commented 1 year ago

I looked at this. Synology OS does not use Grub. I couldn’t find any articles on how to pass kernel parameters.

dlbomber1974 commented 1 year ago

I hadn’t had a chance to look at it. Grub is not supported. Is that the only way the article gives?

On Mon, Feb 27, 2023 at 4:36 PM Romeo1984 @.***> wrote:

I looked at this. Synology OS does not use Grub. I couldn’t find any articles on how to pass kernel parameters.

— Reply to this email directly, view it on GitHub https://github.com/bb-qq/r8152/issues/275#issuecomment-1447130140, or unsubscribe https://github.com/notifications/unsubscribe-auth/APKQWR6FUQUSLTE7NK4VFNLWZUM4JANCNFSM6AAAAAASTCUFFE . You are receiving this because you commented.Message ID: @.***>

Romeo1984 commented 1 year ago

Yes.

dlbomber1974 commented 1 year ago

I do have a question. Are you guys directing specific traffic to the USB 3 port or did you bind the port?

On Mon, Feb 27, 2023 at 4:54 PM Romeo1984 @.***> wrote:

Yes.

— Reply to this email directly, view it on GitHub https://github.com/bb-qq/r8152/issues/275#issuecomment-1447161193, or unsubscribe https://github.com/notifications/unsubscribe-auth/APKQWRZA3Q4S4FAMLO7U4V3WZUPA7ANCNFSM6AAAAAASTCUFFE . You are receiving this because you commented.Message ID: @.***>

Dayofwonder commented 1 year ago

I have not configured anything else. In order to be able to use the higher speed at all, I use the second IP address (USB3) for uploading files (films, pictures) to my NAS from the network, and for any download, i.e. the normal retrieval of files, I use the normal LAN port.

Romeo1984 commented 1 year ago

Potentially Stupid question: Has anybody tried running the 2.5Ge adapter from a powered USB hub? I have this running perfectly on my DS720+ this way. I am wondering if this is a power issue somehow. I heard reports that the DS220j has underpowered USB ports. I would like to hear if somebody has tried it, if not, I might order the same one I am running on my DS720+ and just try it.

javitoalon commented 1 year ago

I just tried using a powered Dell usb dock and same result: uploads are fine, with downloads breaks. Weird thing is that internally you can still ping the 2.5Gbe but not from outside. After ifconfing eth1 down and up, it comes to normal again.

Romeo1984 commented 1 year ago

Confirmed - I too tried a powered USB doc with the same result.

javitoalon commented 1 year ago

I have seen people reporting DS218j is working fine with an Asus dongle, both for upload and downloads. Is DS220j so different from it? I know DS218j is Armada38x but still, it is kind of weird it works so bad for our DS220j with regular RTL8156B usbs.

Romeo1984 commented 1 year ago

Yes. The Armada chipsets are reported to be working fine. The DS220J uses the Realtek RTD1296 chipset. Completely different from some of the other "J" models.
Affected Models using this chipset: DS420j DS220j RS819 DS418 DS218 DS218play DS118

Reference: https://kb.synology.com/en-global/DSM/tutorial/What_kind_of_CPU_does_my_NAS_have

Romeo1984 commented 1 year ago

@bb-qq How can I "Help" with this? I have Linux experience, and two other Synology models: DS720+ and a DS214Play.

sounds2k commented 1 year ago

I've got an ioSafe 218 (essentially a more rugged DS218). It's got 2GB of RAM - the same as many of the units which are stable, but have Intel CPUs. The driver crashes when trying to download (even fairly small ... 520MB), this is with it connected to the rear ports where the link comes up at 2.5GbE. However, if I plug it into the front port the link comes up at 1GbE and appears to be stable ... although of course that's no faster than the built-in NIC. I was able to download a 11GB file - but speed was poor (circa 35MB/s). Downloading the same file over the internal NIC (also at 1GbE) does over 100MB/s. An upload over the 2.5GbE (connected to the front port) topped out at just under 40MB/s, with it in the one of the rear ports I saw a peak of 95MB/s. So it would appear to be CPU/driver related, rather than RAM ... ?

dlyubimov commented 1 year ago

ethtool -K eth2 tso off ethtool -K eth2 gso off ethtool -K eth2 sg off

actually this did make it stable for me, except the speed was dropped below 1Mbps. (for me, it is eth1 with sudo of course).

It does show as a 2.5G link, dhcp doesn't work still. (DS218 on rear usb 3.0). Perhaps it may be useful to try these settings one by one?

image image

dlyubimov commented 1 year ago

Hm... i take it back. switching scatter-gather off seems to also switch tso and gso off automatically. With scatter gather off, the speed drops, which seems to improve stability, but if it is run long enough, it eventually still gets stuck.

With scatter gather on speed is high, and switching tso or or gso off does not change anything, speed is high, but the crash is much easier to reproduce. (I have almost convinced myself to just drop $300 on a new DS220+ shell and move on).

dlyubimov commented 1 year ago

Played a little bit more with this on ds218. No parameters in the usbcore module made any difference (except for changing autosuspend which causes the interface go defunct right away if set !=-1). Changing ring made no difference either.

Same symptoms in the logs as in this thread: everything starts with a tx timeout. Driver tries to send a usb reset call in response to that, and everything goes downhill from there, errors in bulk tx callback, and eventually not being able to read the registers. The TX timeout value is set 5*HZ, and i wonder if that is materially different in this chipset, maybe it makes sense to bump it up for bulk frames.

Also noticed that the driver file has a slightly different line count from the 2.16.3 i had downloaded from the realtek site. I assume all changes from the source are benign, or I am mistaken and there are no changes.

GorgiGR commented 1 year ago

@bb-qq How can I "Help" with this? I have Linux experience, and two other Synology models: DS720+ and a DS214Play.

I don't know whether this helps...?

https://bugzilla.kernel.org/show_bug.cgi?id=198931#c96

bb-qq commented 1 year ago

Thank you all for the information you have provided. Unfortunately, I have not yet found a way to improve stability. The problem seems to be occurring at the lower layers and I have no idea where to start looking.

However, I noticed that the recently released DSM7.2beta has a new Linux kernel version.

$ head -6 ds.rtd1296-7.1/usr/local/sysroot/usr/include/linux/syno_autoconf.h
/*
 *
 * Automatically generated file; DO NOT EDIT.
 * Linux/arm64 4.4.180 Kernel Configuration
 *
 */

$ head -6 ds.rtd1296-7.2/usr/local/sysroot/usr/include/linux/syno_autoconf.h
/*
 *
 * Automatically generated file; DO NOT EDIT.
 * Linux/arm64 4.4.302 Kernel Configuration
 *
 */

The kernel update is unlikely to improve anything, but if anyone has tried it, please let me know. Packages compatible with DSM 7.2 are available here. https://github.com/bb-qq/r8152/releases/tag/2.16.3-4

GorgiGR commented 1 year ago

The kernel update is unlikely to improve anything, but if anyone has tried it, please let me know. Packages compatible with DSM 7.2 are available here. https://github.com/bb-qq/r8152/releases/tag/2.16.3-4

As mentioned in Comment #96 of the bugzilla link I also posted 5 days ago, a similar issue has been resolved when the comment author updated to Kernel 5.16 in Debian. It may be completely unrelated, but it is evidence that kernel updates sometimes may indeed solve issues like this. Unfortunately the DSM is still a long way from kernel 5.xx.

dlyubimov commented 1 year ago

I tried with DSM7.2 RC and the new 7.2 release of the driver. Unfortunately i must report that it was stable for copying about 3Gb before it failed in the same manner as before. As before, it is double failure, as usb reset command does not recover the driver state.

To be specific, I was running DSM 7.2-64551.

dlbomber1974 commented 1 year ago

So I had to uninstall and reinstall the driver and now it says it failed to start. I have the latest update 7.2 which I am suspecting may have some changes that interferes with the install now. Can someone confirm? Now it keeps asking for me to repair it.

Dayofwonder commented 1 year ago

So I had to uninstall and reinstall the driver and now it says it failed to start. I have the latest update 7.2 which I am suspecting may have some changes that interferes with the install now. Can someone confirm? Now it keeps asking for me to repair it.

Same here. I just installed DSM 7,2 final version, had to uninstall the old driver and tried to install [2.16.3-4] and now it asks me to repair the driver. Do we have to execute the SSH command once more to get the new driver working? Upate: The driver works now, after applying this again: sudo install -m 4755 -o root -D /var/packages/r8152/target/r8152/spk_su /opt/sbin/spk_su

dlbomber1974 commented 1 year ago

I tried the command once it failed as the instructions called for as well. It appears something changed in the new release. We need an updated script. Hopefully the dev sees our conversation. On Mon, May 22, 2023 at 6:37 AM Dayofwonder @.***> wrote:

So I had to uninstall and reinstall the driver and now it says it failed to start. I have the latest update 7.2 which I am suspecting may have some changes that interferes with the install now. Can someone confirm? Now it keeps asking for me to repair it.

Same here. I just installed DSM 7,2 final version, had to uninstall the old driver and tried to install [2.16.3-4] and now it asks me to repair the driver. Do we have to execute the SSH command once more to get the new driver working? I cannot test it, because I am away for some days. sudo install -m 4755 -o root -D /var/packages/r8152/target/r8152/spk_su /opt/sbin/spk_su

— Reply to this email directly, view it on GitHub https://github.com/bb-qq/r8152/issues/275#issuecomment-1556980303, or unsubscribe https://github.com/notifications/unsubscribe-auth/APKQWR3WVI4Z6S2BP3ZLZFLXHM6WRANCNFSM6AAAAAASTCUFFE . You are receiving this because you commented.Message ID: @.***>

Dayofwonder commented 1 year ago

I tried the command once it failed as the instructions called for as well. It appears something changed in the new release. We need an updated script. Hopefully the dev sees our conversation.

As added in my post above, it works for me now. I installed the driver with an error, executed the sudo command and tried to install the driver again successfully. Connection is up now, I will test it later on. By the way: My DSM 7.2 is the final version, not a beta or RC version.

dlyubimov commented 1 year ago

On 7.2-64561, unfortunately, it is still broken the same way. I did download the record 5+Gb before it crashed though, but alas.

bb-qq commented 11 months ago

The Realtek driver from which this package is based has been updated to 2.17.1. It does not seem to contain any changes that might improve the stability with the rtd1296, but you can try it if you like. https://github.com/bb-qq/r8152/releases/tag/2.17.1

andrus2049 commented 11 months ago

Synology DS218 DSM 7.2-64570 Update 1

r8152-rtd1296-2.17.1-1_7.2.spk

Asus USB-C2500 USB Type-A 2.5G Base-T Ethernet Adapter https://www.asus.com/networking-iot-servers/wired-networking/wired-adapters/usb-c2500/

Installation as suggested NAS rebooted after installation Fixed IP assigned to the USB dongle 1500 MTU

Internal 1 GB ethernet also connected using a different static IP.

Tests (samba) UPLOAD (virtual machines) 1) 15 GB VM (1 15 GB file + small files) 2) 15 GB VM (1 15 GB file + small files) 3) 87 GB VM (1 55 GB file + 1 27 GB file + small files) all OK, upload speed around 160-170 MB/s

DOWNLOAD tested download of many files of various sizes, I was successful when downloading files up to 22 MB, but larger files caused a network interface lock with no further file access, which could be resolved stopping and restarting the driver in the Package Center.

NikitaOsotsky commented 8 months ago

I'm not sure if it could be the cause, but recently a new update of the SMB service was released.

Version: 4.15.13-0871 (2023-09-26) Fixed Issues

Fixed an issue where certain clients could cause continuous increase in SMB memory usage.

Perhaps this could have indirectly influenced it.

perseus177 commented 6 months ago

Hi Any update ? I want to add 2.5Gbit to my DS218play, but seems to still not working or ?

AndreasArvidsson commented 5 months ago

Anyone know if this also effects RTD1619B? If I can't get it working on my current DS220j I might have better luck with a newer DS223j?