1000001101000 / Debian_on_Buffalo

Tools for Installing/Running Debian on Buffalo ARM based Linkstation/Terastation/Kurobox/Cloudstor devices.
330 stars 41 forks source link

Cannot install on LS421de, network disconnects #47

Closed Omagic76 closed 3 years ago

Omagic76 commented 4 years ago

Hi 1000001101000,

First of all, thanks for your work on enabling the use of Debian on Buffalo NAS.

I have a LS421de for quite some time, and I want to upgrade its software. I follow your instructions to update to Debian Linux, but when I'm installing Debian, the network disconnects. I tried at least 10 times with the same result. And it happens in different screens of the installation process. I have the LS421de connected directly to the router (gigabit ethernet) with the network cable it came with. I tried changing the cable, changing the port it is connected in, and same result. I also tried to limit the ethernet speed to 100M before starting the installation, but still it disconnects from the network. I tried with the Stretch version, same result. And with the stock firmware it works fine. I checked the Issues in the repository, read every forum I could find but can't find the solution. So my last resort is to contact you. Any help you can provide will be greatly appreciated. Thanks.

1000001101000 commented 4 years ago

Good to hear from you!

You're not alone. You're the 4th or 5th person to report this with an LS400 device. Unfortunately, despite having had dozens of these devices at various times I've never seen it or been able to replicate it personally.

Collectively we've spent a lot of time digging into what's going on but haven't accomplished much. We did confirm the two users with ls441de devices with the problem have a different hardware revision than mine but I have no idea what the underlying difference is. We've also spent some time looking at the buffalo kernel source and firmware to see if there is some sort of mitigation there, unfortunately the modern armada-370 code is very different than what they used back then which makes them difficult to compare.

For one user downgrading the connection to 100-base using ethtool kept it from happening: ethtool -s eth0 autoneg off speed 100 duplex full

I built ethtool into the installer so you should be able to set that from a shell session and see if that lets you run through the install. That's not really a solution but if it works it may be a hint towards what's going on.

I've had enough reports at this point I'd be willing to spend some time working on a real fix but need access to the misbehaving hardware to do so. Would you be interesting in trading ls421de devices to facilitate that?

1000001101000 commented 4 years ago

it looks like there has been a surprising amount of development of the network driver since the last time I looked. Nothing stands out to me as a fix for this particular problem but if you manage to get an install to finish it might be worth trying the latest kernel to see if it behaves better.

https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/marvell/mvneta.c

Omagic76 commented 4 years ago

Thanks for your quick reply.

Well, this is embarrassing. After I created the issue, I saw that the installation files where updated, so I downloaded them and tried one more time to install Debian, so I can get screenshots of the errors. The network got disconnected when the installation process was downloading packages, but I let the ssh session open for some time and it reconnected again and I managed to install Debian. I think that the network drivers were updated or something, as you mentioned. I was able to finish the installation and configured the arrays. Now I'm trying to configure and install all the services I need. Thanks a lot for your help and for your work. It's awesome!

1000001101000 commented 4 years ago

Hopefully that’s all it is, though it’s not normal for the network to drop during an install. There’s no driver update or network reconfiguration during the install process to trigger it (it all happens before the ssh-server starts).

Everyone else who reported the issue were able to get through an install within a few tries. You should know within a few days if it’s really stable on the network or not.

Omagic76 commented 4 years ago

Hi 1000001101000,

After two days of trying to make the ethernet adapter work correctly, I couldn't make it. It just disconnects from the network and won't work again. So I gave up and installed the stock firmware. I read your suggestion on using an usb network adapter, I'm considering doing that. I will continue checking your repository to see if you find the solution. Unfortunately, I live in Panama and I can't send the NAS to you, so you can check what is going on. But if I can help you in any way to fix this, besides sending you my NAS, please let me know. I'm not very knowledgeable on linux, but I can try configurations or drivers for the network adapter.

I'm closing this issue. Thanks for your help and your work on this.

1000001101000 commented 4 years ago

The two main things worth trying are setting the speed with ethtool and installing the latest kernel to see if any recent changes help things.

Omagic76 commented 4 years ago

Hi 1000001101000,

This issue is really bugging me, because I don't understand why with the stock firmware the network connection is steady. So I'm trying to see if I can determine the reason why the ethernet adapter disconnects from the network. I was able to activate the ssh with the stock firmware and did some poking around. One thing that caught my attention is that the stock firmware shows two network adapters: eth0 and eth1. With the stock firmware, the one that is in use is eth1. The other one doesn't work. I was wondering if there is a way to change the name of the network adapter in Debian to eth1? I can't find were I can make this change (I'm not a Linux expert user, just a noob). I also disassemble my NAS to see the boards it uses. I think it could be a motherboard version issue and took pictures of it.

Thanks in advance for your help.

Omagic76 commented 4 years ago

Sorry to bother you with this, but in the device tree for the LS421D you have:

ethernet@74000 { status = "okay"; phy = <&phy0>; phy-mode = "rgmii-id"; };

I googled the armada 370 device tree and find this (http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294437.html):

ethernet at 70000 {

                            status = "okay";
                            phy = <&phy0>;
                            phy-mode = "sgmii";
                    };

/* ethernet at 74000 { status = "okay"; phy = <&phy1>; phy-mode = "rgmii-id"; };

I want to add this to the device tree and compile the firmware. But I don't know how to do it. Is this possible?

Thanks again.

1000001101000 commented 4 years ago

good to hear you're still interested.

Let me know what revision it says on your PCB, I can compare to all of mine.

I don't remember if we tried a DTB with both possible adapters active or not but it's worth a try. I'll make one and put together some notes on how you can modify your own for testing when I get some time (after dinner).

1000001101000 commented 4 years ago

I spent a little time looking at this but just ended up with my network not starting. I'll look into it some more in the next few days.

Omagic76 commented 4 years ago

In the PCB I only see: J-3 M1-V0 94V-0 1309 E168114 YANAGI_MB V3.0

And the stickers with the MAC address and the serial number.

I managed to generate the image files, just have to execute generate_images_armhf.sh. But I can't figure out how to compile the dts file to a dtb. I tried with dtc but get error "missing files".

1000001101000 commented 4 years ago

Sounds like the same, here are pictures of mine: https://buffalonas.miraheze.org/wiki/Linkstation_LS421D

The script I use for building dtbs is in that same folder: build_dtb.sh.

I spent some more time playing with the device tree, it seems like trying to add the non-existent interface causes the network startup to fail, it may be some kind of gpio conflict. It does work with the second MDIO device defined, it might be worth seeing if that makes a difference somehow.

I've attached the devices trees with that change. ls421de.zip

1000001101000 commented 4 years ago

here are the steps/commands to compile a dts into a dtb using the tools in my project. technically you could just use dtc but as you've seen that runs into issues when missing prereqs and doesn't give good error/warnings compared to compiling as part of a kernel source tree.

apt-get install build-essential libc6-armel-cross libc6-dev-armel-cross binutils-arm-linux-gnueabi libncurses5-dev gcc-arm-linux-gnueabi lsb-release libssl-dev git 
git clone https://github.com/1000001101000/Debian_on_Buffalo
cd Debian_on_Buffalo/Tools/kernel_tools
./update_kernel_source.sh 4.19
mkdir -p dts/4.19
cp ../../Buster/device_trees/armada-370-linkstation-ls421d.dts dts/4.19/
./build_dtb.sh 4.19
find dtb/

you can then edit the dts and re-run build_dtb.sh to generate a new dtb as needed.

to install the dtb on the device: copy the dtb to /etc/flash-kernel/dtbs/ on the device run flash-kernel to generate new *.buffalo files which include it.

Omagic76 commented 4 years ago

Thanks you very much for the instructions. They were very helpful. Now I'm able to generate the dtb and the images. I tried the device tree you sent me, but it had the same result, the network adapter disconnects after a while. I tried a couple of changes in the device tree but they were not successful. Now that I know how to change the device tree and generate the images, I'm going to try to figure out why in the stock firmware they use the eth1 to connect to the network. I downloaded the source code for the stock firmware and I'll try to find where they define the network adapter. I'll start with that, and then try other things that I can think of. Also, I'm going to try to understand how to compile Linux kernel, so I'm going to be busy for a couple of days. If I have any progress I'll let you know. Thanks again.

1000001101000 commented 4 years ago

Here are the datasheets for the SoC and ethernet chip and some relevant stuff from the linux documentation: https://web.archive.org/web/20150510050616/https://origin-www.marvell.com/embedded-processors/armada-300/assets/ARMADA370-datasheet.pdf

https://www.marvell.com/documents/eoxwrbluvwybgxvagkkf/

https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/pinctrl/marvell%2Carmada-370-pinctrl.txt

https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/armada-370-xp.dtsi

https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/armada-370.dtsi

I think the code provided by marvell for the old kernels must have presented two interfaces for all SoC that support more than one, you see the same thing in the stock firmware

I've also got a script I put together to automate patching and compiling kernel images for the Terastation III. You'd need to make a slight tweak to get it working for the ls400.

specifically, you'd want to change the last line to make -j$(nproc) ARCH=arm CROSS_COMPILE="arm-linux-gnueabihf-" bindeb-pkg

you'll also need to grab the kernel config from your device, /boot/config-4.19.0.9-armmp or similar.

Let me know if you run across something you want me to look at.

1000001101000 commented 4 years ago

I've been meaning to ask you to try the latest 5.7 kernel to see if the problem still happens with it. I tried it on mine and discovered a minor issue with it. For devices with NAND like the ls421de it loads the spi-nor and NAND devices in a different order than previous kernels which breaks the script that runs at boot to retrieve the MAC address from the spi-nor.

I'm working on a fix that I will share when it's ready. You can still try the newer kernel to see if it resolves the issue, it just might give you a random MAC address at boot. You could work around that by hard-coding the MAC address in your interfaces file if you wanted.

Omagic76 commented 4 years ago

Hi,

I installed the image from your repository first (kernel 4.19), and had to change the ethernet speed to 100 Mbps to make the network connection stable. After I configured the installation, i proceeded to install the kernel version 5.7. It booted, but as you mentioned, the MAC address is a random one, so I had to configure the MAC address in the interfaces (in the if-pre-up script). But if I set the ethernet speed to 1 Gb, the ethernet adapter disconnects from the network after some time. If I set it to 100 Mbps, it looks like it stays stable. I'll test it for a couple of days at 100 Mbps. But I want it to be stable at 1 Gbps, so I'll keep looking for a solution. Thanks.

1000001101000 commented 4 years ago

At least that confirms the issue is the same for you as the other users. It also confirms that the latest kernel doesn’t seem to have a fix.

Neither is too surprising but it’s good to know.

1000001101000 commented 4 years ago

Someone I follow in twitter is working on a somewhat similar problem with a different (non-marvell I think) system. He discovered it was using the incorrect clock which was causing speed/reliability problems.

I haven’t done anything which clocks yet, but it seems logical to me that if the chip was operating with the wrong clock that could result instability at higher speeds. I’m not sure what would make it work better on some devices than others though.

It might be worth trying to figure out how the clock is initialized for this device in the stock firmware and the current kernel.

1000001101000 commented 4 years ago

I really have no idea if this could be significant or not. It's really hard to compare the stock kernel to the modern kernel since they are implemented completely differently. One interesting thing I found was a slight difference in clock speeds.

stock (linux-3.3.4/arch/arm/mach-armada370/core.c):

MV_U32 mvTclk = 166666667;
MV_U32 mvSysclk = 200000000;

current(drivers/clk/mvebu/armada-370.c)

static const u32 a370_tclk_freqs[] __initconst = {
    166000000,
    200000000,
};
1000001101000 commented 4 years ago

looks like someone on the openwrt forum has been researching this in parallel and found the problem, they even appear to have a patch already: https://forum.openwrt.org/t/mvneta-help-with-a-voltage-patch/64226

Omagic76 commented 4 years ago

That's great news, thanks. I would have never found the problem. I'll try to figure out how to set the voltage in the device tree, I think is the best solution. Although the use of devmem to change the registers sounds easier, I don't know how to that either.

1000001101000 commented 4 years ago

I'm gonna take a look. I downloaded a copy of devmem and it seems easy enough to use. I have a vague idea how to calculate the proper address and how to figure out what the proper value should be to set it but need to read a bunch of documentation to understand it properly.

I'm going to try tweaking their patch to print the addresses and the before/after values when it runs. I should then be able to use those values to test doing the same via devmem.

It still doesn't fully explain why my devices never see this issue. I assume stock always works because they set it right in their version of the driver, but it still doesn't explain why some never seem to have the issue.

1000001101000 commented 4 years ago

I compiled a kernel with the patch applied and uploaded it here: https://github.com/1000001101000/Debian_on_Buffalo/raw/master/Tools/kernel_tools/armhf/linux-image-4.19.98_4.19.98-2_armhf.deb

I confirmed that it boots normally on my device, could you give it a try and see if the issue goes away?

To install it you'll probably need to:

  1. remove any kernels newer than 4.19 (for simplicity)
  2. remove Kernel-Flavors: armmp from the ls421d entry of /usr/share/flash-kernel/db/buffalo_devices.db
  3. install the kernel via dpkg

This is just to confirm that this custom code resolves the issue. assuming it works I will move forward with trying to create a userspace process to accomplish the same thing. We can then re-install the default kernel on your device and test the userspace fix. Assuming that works I'll work on building that into the installer to prevent issues in the future.

1000001101000 commented 4 years ago

Let me know when you get a chance to do this, I've already added the logic to print out the address/values:

[   18.653752] phytest: Port1 Address E10B84E0 Value  AAA
[   18.658990] phytest: Port1 Address E10B84E0 Value  A8A

I should be ready to test a userspace solution fairly soon after you confirm how that custom kernel behaves.

Omagic76 commented 4 years ago

I'm going to try it now. But I screwed up the last installation so I have to install again and then apply the new kernel. If the network adapter starts at 1Gb, I'll let the NAS work for an hour or so to test if it doesn't disconnect from the network. I'll let you know the results. Thanks you very much. I think this should work.

1000001101000 commented 4 years ago

I played around with devmem some more, it segfaults when trying to access most (but not all) addresses. My initial testing made it look like the address we're interested in would work but the address the patch returned isn't quite what I expected and also causes a segfault.

I'll need to research further what the options are for userspace solution (I also posted a question on the openwrt thread).

I'd prefer to avoid having to maintain a custom kernel for these devices (although I already do for the Terastation III devices).

1000001101000 commented 4 years ago

I found what seems like it should be a binding to handle this:

power-source:
    $ref: /schemas/types.yaml#/definitions/uint32
    description: select between different power supplies

https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/pinctrl/pincfg-node.yaml

Which is in use for at least one device:

sdhi0_pins_uhs: sd0_uhs {
        groups = "sdhi0_data4", "sdhi0_ctrl";
        function = "sdhi0";
        power-source = <1800>;
    };

https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/r8a7793-gose.dts

I created a DTB that attempts to set this value for the corresponding pins for the ls421de and was able to boot from it without any errors being thrown. I doubt it's actually doing anything but don't have a way to test without one of these devices. I think a theoretical ideal solution would look something like this even if it requires adding a binding for it.

1000001101000 commented 4 years ago

I tried setting mine to 3.3v to see if I could induce the problem that way (I'm still guessing not).

Omagic76 commented 4 years ago

The patch didn't work. I installed Buster and then installed the custom kernel:

username@NAS:~$ uname -a Linux NAS 4.19.98 #2 SMP Thu Jun 4 09:25:12 CDT 2020 armv7l GNU/Linux

I checked the dmesg to see if the patch was applied:

[ 19.738754] mvneta d0074000.ethernet eth0: PHY [d0072004.mdio-mii:00] driver [Marvell 88E1510] [ 19.757743] mvneta d0074000.ethernet eth0: VDDO voltage asserted: 1v8 [ 19.774072] mvneta d0074000.ethernet eth0: configuring for phy/rgmii-id link mode [ 19.790319] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 24.994838] mvneta d0074000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 25.002945] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

And it looked like it worked, but after a couple of minutes, it disconnected from the network. I rebooted the NAS several times, with the same results. I hope the guy that posted in OpenWRT can give more details on how he fixed it.

1000001101000 commented 4 years ago

I've also now found this: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/336997.html

It looks like they manually set the memory value in uboot and then also set some config registers on the ethernet chip. In theory the kernel patch should have taken care of the memory value, we can set the phy registers easily enough (we can even build it into the dtb if that helps).

see if running this command help (needs to be run as one line): phytool write eth0/0/22 2;phytool write eth0/0/24 5747; phytool write eth0/0/25 77; phytool write eth0/0/22 0

Omagic76 commented 4 years ago

Ok. I ran the command, so now I'll wait and see if it works. I'll let you know the result.

Omagic76 commented 4 years ago

No luck. The ethernet adapter disconnects from the network. I got some errors in dmesg (I tried lowering the speed to 100 Mbps):

19.650713] mvneta d0074000.ethernet eth0: PHY [d0072004.mdio-mii:00] driver [Marvell 88E1510] [ 19.669767] mvneta d0074000.ethernet eth0: VDDO voltage asserted: 1v8 [ 19.686032] mvneta d0074000.ethernet eth0: configuring for phy/rgmii-id link mode [ 19.702341] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 24.898838] mvneta d0074000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 24.906947] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 61.764981] mvneta d0074000.ethernet eth0: Link is Down [ 73.059603] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=70 [ 172.964920] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=535 [ 173.113654] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=1436 [ 173.125120] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=1014 [ 173.299535] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=94 [ 173.463058] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=736 [ 173.920041] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=112 ... This error repeats multiple times ...

[ 496.544238] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=433 [ 496.568538] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=971 [ 496.722520] mvneta d0074000.ethernet eth0: bad rx status 0f810000 (crc error), size=1280 [ 496.957451] mvneta d0074000.ethernet eth0: Link is Down [ 501.049992] mvneta d0074000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off

The last line is that I tried to set the speed back to 1Gbps. But after that the ethernet adapter disconnected.

1000001101000 commented 4 years ago

That is disappointing.

1000001101000 commented 4 years ago

could you run: strings /dev/mtdblock0 | grep -i vers

I'm wondering if we have the same version of the boot loader: U-Boot 2011.12 (Jan 15 2015 - 21:22:47) Marvell version: v2011.12 2014_T2.0p1

Omagic76 commented 4 years ago

It's the same:

U-Boot 2011.12 (Jan 15 2015 - 21:22:47) Marvell version: v2011.12 2014_T2.0p1

1000001101000 commented 4 years ago

That patch was created for kernel 5.4 (presumably specifically some openwrt variant). I assume it worked for the person who posted that. We could try building a 5.4 kernel with it, we can probably even track down the openwrt config/source and try with that. I don't see why that would make a difference but it may be worth a shot.

I don't fully understand why my patch output E10B84E0 as the address being modified when that other thread listed 0xd00184e0 (which makes more sense based on device tree values) as the address. The final value reported by the patch was A8A which matched the other thread, so it seems correct. I assume there is some sort of offset that the driver is accounting for rather than an actual issue.

It'll be a few days before I have time to look at it again. Trying to build a 5.4 kernel might eb a good opportunity for you to try out cross-compiling a kernel if you haven't done so already.

Omagic76 commented 4 years ago

Thanks. I'll try to add the patch and compile the kernel 5.4 to see if it makes a difference. I still have to learn a lot about Linux, so it might take me a while, but it will help me understand more.

1000001101000 commented 4 years ago

I uploaded the patch,config and build scripts to the same directory I uploaded the kernel package earlier

metrorik commented 4 years ago

Hello everyone. I am also the lucky owner of the ls420 with a network disconnect problem. I am not very versed in linux, but I can join and help as much as I can.

1000001101000 commented 4 years ago

Good to hear from you!

We found a thread on the openwrt forum that seemed to describe the problem and included a kernel patch. I was able to test the patch on my working device and confirmed that it seemes to set the memory value as described but the resulting kernel didn't seem to fix the issue for @Omagic76. I have another idea for a potential fix that I haven't tested yet but not having a malfunctioning device has limited what testing I can do myself.

Would you be interested in sending me your device for testing (to MN, USA)?

If not, we could try to get your device working by limiting the network speed to 10/100 (which works for most people) and have you test the patched kernel and other future fixes.

metrorik commented 4 years ago

Hello my American friend! First of all, I apologize for my bad English.

  1. Yes, I saw this openwrt branch. I even soldered the wire to the chip according to the instructions. When the earth is fed to the 4 output of the microcircuit, a reset occurs. And for some time the network is working again.
  2. I could send you my nas for your experiments, but do you think it is appropriate? I live in Ukraine, and this is very far.
  3. I have a model LS420D in it the network speed is only 10/100. Nowhere to lower. As far as I remember, the problem started when I wrote the firmware not from my model, Maybe then the inappropriate bootloader also overwritten. I am not good at system programming and I don’t know if it can affect the setting of the required voltage when the system boots. But it looks a lot like that. But I don’t know how to flash the original bootloader. After that, the network worked only in TFTP mode with the address 192.168.11.150. And no firmware has helped to restore the network. Then I connected UART and through the console I was able to install your firmware on Debian. Now the network works, but after a while it disappears and appears only after a system reboot or after a chip reset.
1000001101000 commented 4 years ago

Sounds like your soldering skill is better than mine. That might useful for testing other methods!

I assume shipping from Ukraine would be expensive, it may be worth looking into at some point. I really want one of the "bad" ones for testing and would be willing to trade for a "good" one if someone else paid shipping both ways. Obviously this would be much easier for someone near to me.

If you have UART access you can test some additional things that i never have. Can you try interrupting uboot at boot time?

it should be possible to change the voltage within uboot. I beleive that would look similar to:

mw.l 0xd00184e0 0xa8a
run bootcmd
metrorik commented 4 years ago

I think I can even measure the current voltage on the chip

metrorik commented 4 years ago

I do not know how to stop the boot process. When the system starts up on the console, you can see the boot process, you can then copy it, but the system does not respond to commands until it boots up and writes an invitation to enter.

1000001101000 commented 4 years ago

There is typically a brief time it says something like "press any key to stop". If you have RX and TX connected you should be able to press a key at that point. I usually just tap enter repeatedly when powering on the device since it goes by very fast.

metrorik commented 4 years ago

Yes op writes: to cancel autorun, click the button, but no matter how much I tried, the download still started

1000001101000 commented 4 years ago

can you post the output?

1000001101000 commented 4 years ago

@metrorik @Omagic76

I went ahead and built the latest backports kernel with the patch and posted it here: https://github.com/1000001101000/Debian_on_Buffalo/blob/master/Tools/kernel_tools/armhf/linux-image-5.6.14_5.6.14-2_armhf.deb

There have been a lot of recent changes to the mvneta driver, that might have something to do with why the patch didn't work for 4.19.x (the patch was developed against 5.4). Could you both try installing this kernel and see if it helps with the network problem?

It will cause the mac address to change due to a separate issue I haven't fixed yet.

metrorik commented 4 years ago

OK. We have night already). I will do it tomorrow