Open smsimeonov opened 1 year ago
howdy:
I have a link to the internet archive of nas-central on the main page of the wiki on miraheze.
I have made sure the XOR acceleration features of the SoCs are enabled wherever possible…. but to be honest I’ve never looked into how to actually use them for raid5. There’s probably a number of optimizations possible for some of these devices.
For the LS-QL I recall I had to patch the kernel to allow enabling one of the usb ports, I seem to remember one or both were also unreliable. I also think I found it to have poor performance even for that generation. There is probably some kernel work that still needs to be done for that model.
Thanks for your input!
Since I am not familiar with XOR acceleration, I will look up some details on it to see if I can get better write rates at least.
And for the LS-QL, I ended up setting up a SFTP file backup to one of my public facing servers overnight to have a [centralized] backup. I figured out it is a kernel issue, or maybe the microcontroller itself. I noticed there is a fluctuation on the voltage output which I haven't see on other systems I have. That could or could not be the reason. I tested since I have plenty of SBC's and often when I get that error (connerr 22) it is because there is not enough power to the USB device (Connecting something with a high power draw to a USB 2.0 port, Power supply to the SBC is insufficient or in one case - needing a kernel patch)
I've noticed on a similar platform NAS device I have (LG N2B1) that it loads a mv_xor driver for marvell orion devices. Apparently this helps a lot. I can see in the active config of the LS-QL I still have at home, that this is active in the /boot/config. However, unlike the stock NAS box, I cannot see messages from it in dmesg. Is there a way to see whether mv_xor is actually loaded? I do not see logs of it anywhere.
For clarity: What OS are you running on your LS-QL? If you post your DMESG output I can take a quick look.
I would expect it to load at boot since it's enabled by default for all orion5x devices:
compatible = "marvell,orion-xor";
reg = <0x60900 0x100
0x60b00 0x100>;
status = "okay";
xor00 {
interrupts = <30>;
dmacap,memcpy;
dmacap,xor;
};
xor01 {
interrupts = <31>;
dmacap,memcpy;
dmacap,xor;
dmacap,memset;
};
};
from: https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/marvell/orion5x.dtsi
I don't have an orion5x device online at the moment but I would expect something similar to what I see on the TS-XEL:
[ 1.238644] kirkwood-pinctrl f1010000.pin-controller: registered pinctrl driver
[ 1.241439] mv_xor f1060800.xor: Marvell shared XOR driver
[ 1.277554] mv_xor f1060800.xor: Marvell XOR (Registers Mode): ( xor cpy intr )
[ 1.281653] mv_xor f1060900.xor: Marvell shared XOR driver
[ 1.317595] mv_xor f1060900.xor: Marvell XOR (Registers Mode): ( xor cpy intr )
I am running your Debian build for Bullseye, since Bookworm install didn't work for me.
Log from LS-QL: LSQL_DMESG_updated.txt
I am also attaching a comparative DMESG output from the stock box: DMESG_stock.txt
I am expecting to see "mv_xor" lines in the dmesg on the LS-QL as I do on the stock box, but I just do not.
I plugged in my LS-CHL running Bookworm which shows it as expected.
[ 6.789355] mv_xor f1060900.dma-controller: Marvell shared XOR driver
[ 6.856341] mv_xor f1060900.dma-controller: Marvell XOR (Registers Mode): ( xor cpy intr )
I started noticing more and more minor differences than what I would expect. I finally noticed this line in your dmesg:
[ 0.000000] Machine: Buffalo Terastation Pro II/Live
Are you booting a disk you originally set up on your TS-HTGL or something like that?
Are you booting a disk you originally set up on your TS-HTGL or something like that?
Sorry, I forgot I had the TS-HTGL plugged right now. I am absolutely sure it was the same case with the LS-QL, but I can check later.
Regardless, shouldn't this appear on the TS-HGTL being that its an Orion platform too?
The TS-HTGL or ts2pro is a different Orion5x SoC and is also defined a bit differently in a way that pre-dates device-tree.
I remember when I added support for the XOR and/or CESA engines for the TS-XL I noticed that kernel code for the ts2pro lacked some entries. At the time I looked at the SoC docs and determined that SoC didn’t support it. I think I even discussed it with whoever had an issue open at the time.
Looking at the kernel code I don’t see entries for either: https://github.com/torvalds/linux/blob/master/arch/arm/mach-orion5x/terastation_pro2-setup.c
So I checked the two LS-QL dmesg logs and both of them have mv_xor. Very strange as I am getting even worse performance on them than compared to the TS-HTGL. I guess I will look into it further.
Attaching the dmesg output. It even has the USB errors I mentioned.
I'm vaguely aware that the SoC of the Terastation was one of the high-performance versions compared to the more budget one in the LS-QL but don't really know any details about all that. The 4x SATA ports via a PCIe sata controller on the Terastation vs the port multiplier connecting the for bays to a single SoC SATA port on the LS-QL will have impact on throughput etc but again, I'm not really versed on the specifics.
Still, best I can tell the LS-QL is slower than the other Orion5x Linkstations, possibly beyond what would be expected by the hardware alone, and I don't have a good theory as to why.
You've been very helpful, thanks! One last thing if you can: I cloned the repo. If I were to build my own kernel, I would just substitute the config with one of the mainline configs as how the note in your folder structure mentioned, correct?
you could probably adjust things to match what makes more sense to you.
To build a kernel with my scripts:
Thanks for the project, first of all. I'm in the process of reviving 4 old Buffalo NAS I have and handing those off to my relatives. These are two LS-QL, one TS-HTGL v2 and a HS-DHTGL.
Initially I tried just pushing a merge request, but its been a while I've collaborated on a project lol (I have my own private GitLab)
In relation to transferring uImage and initrd, I wanted to add the following after the xfer acp_commander lines:
NOTE: If the above commands complete too fast, make sure the transfer actually happened. If you do not have permission to write to /boot, files "initrd.buffalo" and "uImage.buffalo" already exist or you simply don't have a /boot - ACP Commander will soft fail without any message. You can enter a "shell" of sorts on stock firmware by using the -s flag. That way you can check whether the transfer happened reliably.
Aside from this, I found that all the devices I have (all armel), can be forced to boot to EM via just removing all drives and starting without them. The only difference being (from EM boot via acp_commander or button combination) that the hard drives take longer to spin up initially (around 15 seconds). My LS-QL had no other options for EM boot aside from that and a serial connection. Considering I have to resolder some things on the mainboard for the latter to work, I skipped it of course. I might try to add this to the Miraheze wiki too.
It probably is worth it to mention in the wiki another good source for info, which is the old nas-central.org site, which could still be accessed via archive.org (https://web.archive.org/web/20190302093121/http://buffalo.nas-central.org/wiki/Main_Page). That obviously doesn't appear in searches anymore, so it would be good to note imo.
Last I have some notes on trying to optimize file transfer speeds. Since all 4 devices I have are quad drive models, I set up softwareRAID (mdadm) as RAID5 on all. This has a huge impact on performance, but I mostly need the capacity as long as the read rate is above 7-8 Megabytes per second. This works with all types of transfer methods (Samba, NFS, FTP, FUSE). They all get very similar read speeds (around 14-20 Mgbps). The issue comes with write speeds. All new homes of these NAS have had their media archive on a PC, so a part of the initial setup is transferring them over to the NAS. Since we're talking about TB worth of data, the faster it is - the better. Depending on the device, the write speeds were between 4 and 7 Mgbps, but this is through SMB. NFS and FUSE returned the same speeds, while FTP writes were better with speeds generally between 11 and 14Mgbps writing. The biggest bottleneck appears to be the CPU. In the case of SMB/NFS/FUSE, around 50% of the CPU was for their respective process, while the other 50 was for mdadm. With FTP, the proportion was around 85% to 15%. I have tested the arrays themselves with fio in all scenarios applicable and the RAID itself is capable of at least 4 times the bandwidth it is given by the file transfers.
As one issue, possibly with the kernel, is that on the LS-QL's, the USB ports do not seem to be working reliably. They would error out with connerr 22 (via journalctl) randomly, even when not in use.
I will add any other findings as I go through further with the process.