Open alexarda opened 1 year ago
Hi,
Nice. I didn't expect anyone to actually find this useful.
SATA I haven't tried. I'm using an NVMe SSD but in a USB3 adapter thing. If I find some time over the weekend I'll try hooking up a SATA drive and see what happens.
Please disregard this. It was a rookie error on my part. I attached another drive and it worked perfectly.
It's stupid because the other drive that didn't work is known good, and I switched out both power and data cable with no result.
But then I tried another drive and it was detected.
Apologies for wasting your time and thanks for documenting everything!
Nice. I wonder what the difference is? My long term plan was to work out how to get an NVMe drive to work properly in the PCIe slot.
Maybe a vendor thing? The non-working drive is an Intel - quite an old one too.
Interesting, an Optane NVMe drive also has issues. The functional vendors for me are Adata for SATA and Samsung for NVMe.
Maybe a vendor thing? The non-working drive is an Intel - quite an old one too.
Interesting, an Optane NVMe drive also has issues. The functional vendors for me are Adata for SATA and Samsung for NVMe.
kioxia nvme didn't work for me. It looks like it's working initially and then stops responding. So the machine boots but eventually goes crazy because it can't read/write data anymore.
I'll try to find another nvme and give it a go. I assumed that the pcie was just garbage. :)
If you're interested I managed to get CentOS Stream reasonably stable with an AMD GPU, which is nice.
Hi there, thanks for all of this, it's been pretty useful.
I purchased a gigabyte R270-T60 on eBay hoping to use it as a NAS. Your tips helped - I'd add that arm-smmu.disable_bypass=n is needed with new kernels.
Have you been able to get normal hard drives working? I can't seem to get some Seagate exos drives working, at least with the backplane it came with.
@no2chem You must've gotten that R270-T60 from the same seller as I did (erik something something). I gave up on SATA. Every one of my drivers gave me some variation of ata1: SATA link down
or link online but 1 devices misclassified
, so I put in an LSI HBA. Fortunately, the backplane even does SAS! That said, nvme has worked as long as I do not set acpi=force
like I've seen in few other guides.
@ayakael Actually, I figured out the SATA - the issue appears to be that the SATA ports don't support spread spectrum. WD drives work fine, but Seagate drives require disabling spread spectrum via seatools - you need to plug it into another device that supports spread spectrum and disable it. Works reliably after disabling spread spectrum.
@no2chem Many thanks, I'll try that out! By the way, when you do lspci, do you see any Crypto acceleration device on your server?
I don't, is there a specific accelerator you're looking for? I see
lspci | grep accelerator
0000:00:09.0 Processing accelerators: Cavium, Inc. THUNDERX Random Number Generator (rev 09)
0000:00:09.1 Processing accelerators: Cavium, Inc. THUNDERX Random Number Generator virtual function (rev 09)
0000:03:00.0 Processing accelerators: Cavium, Inc. THUNDERX Zip Coprocessor (rev 09)
0000:04:00.0 Processing accelerators: Cavium, Inc. THUNDERX DFA (rev 09)
000a:00:09.0 Processing accelerators: Cavium, Inc. THUNDERX Random Number Generator (rev 09)
000a:00:09.1 Processing accelerators: Cavium, Inc. THUNDERX Random Number Generator virtual function (rev 09)
000a:03:00.0 Processing accelerators: Cavium, Inc. THUNDERX Zip Coprocessor (rev 09)
000a:04:00.0 Processing accelerators: Cavium, Inc. THUNDERX DFA (rev 09)
RIght, some ThunderX systems have crypto accelerators, which isn't my case. I'm now trying to get an Intel QAT accelerator going, but I'm experiencing MSI-X related errors. Apparently, MSI-X is broken on my system, trying to debug what's wrong. The more I use this server, the more I realize that a bunch of stuff is just plain broken.
Hi fifteenhex
Thanks for this, I managed to boot one of these boards with an ATX PSU by following this repo.
Did you happen to check whether your SATA ports worked directly from the board? I can't get them to work on mine, and I don't have the backplane to test if its a direct connection issue.
Hey @alexarda, could you explain how you got this to boot from a ATX PSU? I just picked up one of these devices and was surprised it did not come with the standard connectors.
@keixthb yes, absolutely. So the pinout can be found in another issue thread here.
I bought one of these adapters from ModDIY and repined it.
I did just use the 5vsb from the ATX PSU rather than try and get 12vsb but it seems to work...
@alexarda ok, I just ordered one, thanks so much. I'll see if i can get it to work and post an update in the next few weeks or so
@alexarda How did you re wire the power cable, and did you use 2 of them on the machine? I wired mine following the 18 pin diagram and it didn't boot, but I only purchased one... I must be doing something wrong. This is where I'm at with the project:
...also, I designed a fan bracket for a 120mm if anyone wants a copy of the .stl.
Thanks!
@alexarda I tried connecting pin 10 (on the 18pin) to the pin 9 for the +5vsb, and the lights come on, but I don't get anything on the vga out.
Hi @keixthb, sorry I’ve been a bit lax with this. This went into storage after I bought an Ampere system. I can dig this out of storage for you this weekend. From memory serial is the best option to start tinkering. VGA works but much later in the boot process and you might catch issues on serial you wouldn’t otherwise see
@alexarda No worries. I would appreciate that, thank you! Ideally, my goal is to install RHEL 9 on the system so I can test some of the Nvidia cards (centos is good too though). I know the tesla line is sensitive with firmware so I'd like to install the driver for a bunch and see which ones work--and which ones don't.
@keixthb this is my version
It comes from 1 x 24 pin ATX and 1 x 8 pin EPS.
Pinout is all 12v and GND as in the other thread EXCEPT I use 5vstb from the ATX supply on pin 10.
Bit difficult to show broken out, but here you go:
That single heat shrunk cover cable is a dumb error. It's meant to be jumper to power the ATX supply on PS_On to GND. I thought I could get the board's power button to work, so I depinned it and in so doing broke the pin.
I never got further, but when I need to power the board it goes to pin 11 of the front panel header.
@alexarda I got it to turn on with the cable, I'll work on the serial next. How did you get the centos 9 kernel installed? Did you use a aarch64 dvd iso and boot from a USB stick?
From memory, yes. This is from my notes at the time:
CentOS: Tested =
acpi=force --> install wont display correctly, fail
acpi=force pci=noaer pcie_aspm=off --> install wont display correctly, fail
acpi=force pci=noaer pcie_aspm=off modprobe.blacklist=ast --> failed at startx after install
acpi=force pci=noaer pcie_aspm=off console=ttyS0 --> install success!
After install changed to = acpi=force pci=noaer pcie_aspm=off modprobe.blacklist=ast amdgpu.dpm=0 --> Working!
Then: systemctl set-default graphical.target
Poweroff, insert AMDGPU
@alexarda Thank you! Okay, so you installed the linux kernel first--then the gpu. I ordered a serial cable for the machine that should be here today. I'll try it with acpi=force
, pci=noaer
, pcie_aspm=off
, modprobe.blacklist=ast
and see what happens.
So, using sudo minimum -D /dev/ttyUSB0
, I configure with:
Which yields the following when the gigabyte server turns on:
I downloaded the ARM64 (aarch64)
version of centos stream 9 here, and flash a USB drive using balena etcher:
I insert the usb stick into the machine as shown:
And the result is the same on the serial:
This might be a hardware/firmware issue. Possibly ram at that point. It should boot and drop you to an EFI shell.
I don’t think I captured a boot log. But I might get a chance to do so later this week if it would help
@alexarda that would be great, thank you. I will work on it again and see if i can figure out what's going on my end. I think I have some ram sticks I can pull from another working machine to test with.
@alexarda Can you send a picture of the memory you are using with your machine? I tried two separate sets and I get the same memory controller error.
@keixthb boot log as mentioned: MT70-HD0_BootLog.txt
Note the very different BDK versions. On your screenshot, it looks like it stalls at the point in my log where we can see BMC IP: N/A, which is immediately followed by the ram configuration/testing/training info.
My suspicion is still a ram issue, but I wouldn't rule out some weird BMC thing too. These motherboards are a total mess...
The part numbers of my ram modules are all M393A1G43DB0-CPB - that's in the log too but here also for posterity.
@alexarda Thanks, I just ordered the memory. I'll update once it arrives.
@alexarda good news, the efi works! With the memory installed in the furthest blue slots away from the CPU sockets, I am able to see the menu:
Using a prebuilt rhel9 image for a raspberry pi 5 I am greeted with grub:
I change the parameters with the following:
But it appears to be hanging on "booting a command list" for both rescue and the top kernel 5.14
For reference, this is the content of my /dev/ttyUSB0 and /dev/ttyS0:
@alexarda can you explain how you constructed your boot image? I can't quite figure out the way of constructing one myself. The only way i can launch grub is using the pi image and I think something must be different enough between the architectures that's preventing it from launching properly.
@keixthb did you manage to update the firmware? Later versions are still buggy but possibly less buggy.
I'd start there. The build date in your screenshot is 3 years prior to the latest firmware build available here
Hi fifteenhex
Thanks for this, I managed to boot one of these boards with an ATX PSU by following this repo.
Did you happen to check whether your SATA ports worked directly from the board? I can't get them to work on mine, and I don't have the backplane to test if its a direct connection issue.