1a2m3 / SPD-Reader-Writer

SPD Reader & Writer with Software Write Protection capabilities supporting Arduino and SMBus
https://forums.evga.com/m3053544.aspx
GNU General Public License v3.0
82 stars 13 forks source link

After connecting, am unable to actually perform any operations... #41

Open JakeMartin99 opened 9 months ago

JakeMartin99 commented 9 months ago

Hi, I recently followed your schematic to build the hardware to run your SPD-Reader-Writer, uploaded your firmware to the arduino nano and downloaded the latest software release to my computer. However, from both the app and the CLI, the system can find/connect on COM3, but then cannot actually read any data. image image Also of note, while the hardware is hooked up to my computer, the RGB on my stick of ram lights up, so at the very least it is actually getting power all the way to ram.

For context, my ram is DDR5-6000 from GSKILL (F5-6000J3038F16GX2-TZ5NR), and I am attempting to access the SPD in order to fix corruption that I believe was caused by RGB software causing issues over the SMBus.

Any help or suggestions would be much appreciated!

Thanks!

JakeMartin99 commented 9 months ago

Also, I just measured voltages, and for some reason my +5V rail is 9V above ground, and my +3.3V rail is at ~6V. My +9V rail is at 9V as expected.

I had measured my resistances prior to turning it on for the first time and everything seemed to check out, so this is surprising to me.

1a2m3 commented 9 months ago

The Arduino does not detect any RAM connected to it. Make sure the RAM is getting power and SDA/SCL lines are not mixed up. Make sure the voltage levels are correct before connecting RAM to your Arduino. For DDR5, you only need a 5V supply. (9V rail is for DDR4-DDR3).

JakeMartin99 commented 9 months ago

The SDA/SCL lines may have been messed up, but I've been doing tinkering without the RAM stick installed since and I have not been able to resolve the voltage discrepancies, so I haven't wanted to retry with the stick. Having removed the RAM stick, I still get the following odd voltage readings: GND to +3.3V: 6.5V GND to +5V: 9.4V GND to +9V: 9.0V +3.3V to +5V: 2.0V +3.3V to +9V: 1.7V +5V to +9V: 0V I've measured the resistances of all of the paths I can think of, and everything seems to match the schematic properly. Although one measurement that seems a bit odd to me is that the resistance between the +5V and +9V rails is just under 200ohms.

I included the 9V rail to be thorough and fully follow the schematic, but is there any reason that that would be causing the difference? I don't think I have any wires soldered in the wrong places nor any unintentional connections between places that shouldn't be, so I'm somewhat lost as to what to even check.

1a2m3 commented 9 months ago

What Arduino model are you using and how is powered? Via USB or with an external source applied to VIN input? What 9V supply are you using, and have you tested it separately?

Eliminate all non-DDR5 required modules from your setup (HV_CTL, HV_SRC and SA1_CTL), and try with just Arduino and DDR5 circuitry. A picture of your setup would be handy.

JakeMartin99 commented 9 months ago

image

Some of the connecting wires are soldered to the underside, such as those connecting the 9V booster (Vo to 9V rail, GND to GND rail, and Vi to 5v rail), those connecting Arduino GND, 3.3V, and 5V pins to their respective rails, the one connecting A1 to the schottkey, and those connecting A4 and A5 to their respective resistors and RAM wires. And some of the wires that go over to the RAM adapter itself.

To answer your questions, it is powered just from the USB connection, and I haven't really tested the boost component (aside from validating that it does actually produce 9V on the 9V rail).

JakeMartin99 commented 9 months ago

I'll try your suggestion of disconnecting those sections you mentioned and see what happens

1a2m3 commented 9 months ago

Kinda hard to understand what's connected to what, but to me it seems the 3.3V rail is connected to ground, is that so?

image

JakeMartin99 commented 9 months ago

Sorry, kinda hard to get a good picture that represents everything that's happening. But no, GND and +3.3V have appreciable resistance between them, and as far as I can tell the only direct connection between them is "through" the capacitor. The wire you highlighted is hooked up to the 2k resistor, not the GND pin

1a2m3 commented 9 months ago

Please draw a diagram of your setup in fritzing, and post it here. That way it will be easier for me to see what's wrong and what needs to be fixed.

JakeMartin99 commented 9 months ago

SPD_bb_nowires SPD_bb One with wires and one without, since the wires kinda start obstructing a lot of stuff. I couldn't find an exact part match for my voltage booster (https://www.amazon.com/dp/B084YS7FZ8?psc=1&ref=ppx_yo2ov_dt_b_product_details), so I just used something that was vaguely related and had the right number of pins.

1a2m3 commented 9 months ago

Everything appears to be correct component- and connection-wise.

I would start checking each component individually, at least the ones that are needed for DDR5 operation, ensure the resistor values match their labeled values.

I would eliminate parts which are not needed for DDR5, if you aren't planning to use any other type of memory.

jake-pcb

Make sure there are no solder bridges or poor soldering joints. Check the resistance between SDA/SCL pins on the DDR5 and corresponding I2C pins on the arduino.

If nothing helps, then I can offer to have a look at your board and adapter in person for a donation, if you are willing to send me your board and cover shipping both ways (I'm in Canada). If you want to go that route, leave your email, and I'll contact you to discuss details in private.

JakeMartin99 commented 4 months ago

Hey, I know it's been a while, but I've only recently had the time to get back working on this. I was able to remove the non-DDR5 components you X-ed out, and fix up some of my soldering for the remaining pieces, and it is working now to be able to read data from my disconnected RAM sticks! So thank you!

However, since you seem to be a bit of a RAM expert, I was wondering if you could help / point me in the direction of documentation to help me: The reason I built this, is that ram stopped working, and I could only get my computer to boot up again by removing 2 of my ram sticks, and I believe it was because openRGB corrupted the SPD on at least one of the sticks that I took out. So, I want to use your invention to rewrite the SPD on whichever sticks are corrupt, and see if that fixes it when I put them back in. However, while I can use your program to get the dump from one of my sticks, and I can use thaiphoon burner to get the dump from the still-installed sticks, and I can manually compare the differences, I really have no idea what I'm looking for, in terms of what bytes are different for legitimate reasons, vs what are encoding something bad (such as an overclocking setting well beyond what is physically possible). I'm additionally confused by the fact that there seem to be differences in some bytes between my two installed, functional sticks, despite them both being set to use the exact same overclocking settings....

For reference, my ram sticks are DDR5 AMD EXPO enabled G.Skill Trident Neo Z5 RGB F5-6000J3038F16G.

Any insights would be most appreciated! Also, I would be more than happy to send a donation your way to support your work.

1a2m3 commented 4 months ago

Hi, Post your SPD dumps (attach them as files), I'll see what the differences are.

The differences could be in serial numbers, but as far as I know G.Skill leaves serial number fields blank on their DIMMs.

To make donation use paypal link: https://paypal.me/mik4rt3m, or via bitcoin (get the wallet address in the program's about window).

JakeMartin99 commented 4 months ago

So these are the dumps from Thaiphoon burner for my installed RAM: INSTALLED_SMBus-0-EEPROM-51h.txt INSTALLED_SMBus-0-EEPROM-53h.txt

They seem almost identical, except for the 0x200 row, where the former vs latter comparison is: 04 CD 00 23 33 7F 52 03 F0 46 35 2D 36 30 30 30 -> in ...51h 04 CD 00 23 33 CC 54 A0 F5 46 35 2D 36 30 30 30 -> in ...53h

Then, here's the first removed stick, read via SPD-reader-writer (I made mild formatting changes to match the other files to make comparison easier): REMOVED_SPD-RW-stick1.txt and seems identical to the installed except for the 0x200 row, where it has 04 CD 00 23 33 24 5B DE F2 46 35 2D 36 30 30 30

And the second removed stick, read and reformatted similarly: REMOVED_SPD-RW-stick2.txt and seems also identical except for the 0x200 row, where it has 04 CD 00 23 33 0B 5A F0 F3 46 35 2D 36 30 30 30

So, in general the 0x200 row seems to be the only places with any difference across the 4, with it being localized to the 4 bytes 0x205 through 0x208, with the following:

stick 0x205 0x206 0x207 0x208
Installed: 51h 7F 52 03 F0
Installed: 53h CC 54 A0 F5
Removed: 1 24 5B DE F2
Removed: 2 0B 5A F0 F3

Unless I missed something, this maybe says that the problem isn't actually with the SPD like I had thought? Since there doesn't seem to be any differences present in 1/both removed sticks, but not in the installed ones? Do you have any other thoughts?

1a2m3 commented 4 months ago

All of your SPD dumps are fine, the differences are in serial numbers only.

JakeMartin99 commented 3 months ago

Hmmm, well at least that's eliminated as the source of the problems... would you happen to have any thoughts of anything else that could cause new (purchased new and installed / working for a couple months) RAM sticks to stop working, in a manner that was unaffected by both resetting bios and fully powering down the computer (multiple times each)? As far as I'm aware, the SPD is the only thing on the stick that has persistent data which could be corrupted, but maybe that's wrong? Also I recognize you might not know, but figured I'd see if anything obvious stuck out to you to check as well.

1a2m3 commented 3 months ago

Memory can be undetectable for many reasons, from CPU's MC not being able to handle more than 2 sticks at higher frequencies. Chips can go bad, or loose BGA solder contacts under the chips, dirty contacts on the DIMM itself or motherboard slot can get dirty or bent. Also, bent or dirty CPU socket pins can cause RAM to be not detected.

I would start by testing each stick individually on a knowingly working board one by one to eliminate CPU-motherboard-RAM compatibility issues and to isolate working sticks from non-working ones.

The part number you provided is for a dual channel kit. DDR5 is known to be unable to work at high frequencies, when multiple DIIMs per channel are used. If you need 64GB, instead of running 4x16GB its better to get 2x32GB kit, that's the configuration I'm currently running @ 6800MT/s CL32.

Either way, G.Skill RAM comes with lifetime warranty, if RAM is dead completely and it wasn't your fault, they'll replace it.

JakeMartin99 commented 3 months ago

Ya, I may have to try a few more things then, and just RMA if it still seems unresolvable. It's the strangest thing though, because it had worked fine for weeks with all 4 sticks (brand new build, so all brand new components), but then one time I turned on my RGB control software (openRGB) and my computer hung, crashed, and then could not be booted back up (even to BIOS, and even after power cycling and resetting BIOS) until I took out the 2 sticks, so it seems unlikely that it would be soldering contacts or dirt or anything like that (which is why I suspected the SPD initially). But, to your point maybe something on the CPU memory controller or one of the sticks just totally and completely died for some reason at that time?

1a2m3 commented 3 months ago

CPU or its MC are unlikely to die from using OpenRGB.

Try reading PMIC0 registers from working stick and dead stick and compare them or post them here.

To read PMIC0 registers, use command line version with the following command:

spdrwcli.exe /read COM5 72

Replace COM5 with your Arduino's port. 72 is PMIC0 address for DIMM with EEPROM at address 80. If your DIMM is at different I2C address, subtract 8 from it to get its PMIC0 address.

Edit: PMIC0, not SPD5 hub

JakeMartin99 commented 3 months ago

Here's files for the two currently removed sticks: REMOVED_SPD-RW-stick1-PMIC0.txt REMOVED_SPD-RW-stick2-PMIC0.txt

Did both the queries against address 72 and 80 for each.

For 72 there appears to be no difference between them...

Is there any way to get the results for 72 out of my installed ram without having to take them out, like I did for 80 using Thaiphoon Burner? With the way my rig is built, physically getting RAM in and out is a bit of a chore, so I'd like to minimize how much I have to do it if at all possible.

1a2m3 commented 3 months ago

SPD-RW also supports SMBus, just like Thaiphoon Burner. What motherboard and chipset is your system based on?

Mainstream Intel platforms are fully supported, but AMD needs some tuning and testing.

If you have an AMD based system, I'll give you a beta version to run some tests for me, before Smbus on AMD is fully supported.

JakeMartin99 commented 3 months ago

Ah, ya I am on AMD, but happy to try your beta version. I'm on MSI MEG X670E ACE w/ AMD Ryzen™ 9 7950X3D, so AM5 / Zen4 architecture X670E chipset

1a2m3 commented 3 months ago

Excellent, here you go: ~20240616-1.zip~

Extract files to a directory, then open an elevated command promt line (cmd.exe), navigate to folder where you extracted files using cd command, and run the following commands, one after another:

spdrwcli.exe /find smbus > find.txt spdrwcli.exe /scan 0 > scan0.txt spdrwcli.exe /scan 1 > scan1.txt

Then post your *.txt files, but compress them into a zip archive first, as they will be huge.

The first command will scan for available smbuses, second command will scan for eeproms on bus 0 (default), and third command will scan for eeproms on bus 1. Bus 1 is not typically used for eeproms, so it will fail, but I still need you to run it to make sure it fails properly.

JakeMartin99 commented 3 months ago

Well, it did return a lot of stuff...

find-scan0-scan1.zip

1a2m3 commented 3 months ago

Thanks, smbus discovery works fine, but scanning buses finds false positives.

Stay tuned while I prepare new test build.

1a2m3 commented 3 months ago

Here you go:

20240616-2.zip

Run the same commands as above and post the results. The log files will be smaller, but still compress them before attaching.

The false positive was caused because the SMBusInterrupt flag (1) of status register (0x00) was checked first after checking for HostBusy flag (0), but in your case the SMBusInterrupt flag is set when smbus transaction is complete, even if the transaction ends with an error. (On Intel systems the equivalent Interrupt flag is set after a successful transaction only).

I rearranged the order of status register flags checking to check for errors first after checking for busy flag, and before checking for interrupt flag. This should resolve false positives.

I also replaced the CPUIDAPI.dll with non-debug version, this will reduce the amount of debug output. It works fine, so for now only debug output from spdrwcore.dll shall be enough.

JakeMartin99 commented 3 months ago

Here you go: find-scan0-scan1-v2.zip

1a2m3 commented 3 months ago

Thanks! Everything works properly and fails properly now. 👍

To read PMIC data off your RAM via SMBus you can use the same beta version and save data directly to binary file using these commands:

For first DIMM at address 81 on bus 0: spdrwcli.exe /read 0 73 PMIC1.bin

And the second one at address 83: spdrwcli.exe /read 0 75 PMIC2.bin

The program will still output debug data while running, but the binary files will be clean.

JakeMartin99 commented 3 months ago

First one:

      00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0010: 00 00 00 00 00 2C 20 00 00 04 00 05 60 00 60 60
0020: CF DC 63 00 00 DC 63 B4 63 80 88 42 20 22 B4 5E
0030: 00 00 80 00 0E 00 00 00 00 00 00 12 8A 8C 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Second one:

      00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0010: 00 00 00 00 00 2C 20 00 00 04 00 05 60 00 60 60
0020: CF DC 63 00 00 DC 63 B4 63 80 88 42 20 22 B4 5E
0030: 00 00 80 00 0E 00 00 00 00 00 00 12 8A 8C 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Files: PMIC-installed.zip

JakeMartin99 commented 3 months ago

Seems like the difference between removed vs installed is all line 0020

0x78 vs 0xDC vs 0xB4 looks to be 120 vs 220 vs 180 0x06 vs 0x5E looks to be 6 vs 94

Unclear to me what these differences represent for the RAM though, or if they would plausibly be breaking something.

1a2m3 commented 3 months ago

Register 0x21, 0x25, and 0x27, where the differences between installed and removed sticks are respectively SWA, SWC, and SWD Voltage Settings & Power Good Low Side Thresholds.

Register 0x21 is SWA_VOLTAGE_SETTING and SWA_POWER_GOOD_THRESHOLD_LOW_SIDE_VOLTAGE_SETTING attributes.

0x78 is default value, which stands for 1100 mV or 900 mV -5%. On my DDR5 sticks that byte also differs from the default value, so it shouldn't be causing issues.

Register 0x25 is SWC_VOLTAGE_SETTING and SWC_POWER_GOOD_THRESHOLD_LOW_SIDE_VOLTAGE_SETTING and its value matches the value at register 0x21, so it seems to be intentional. Also on my sticks registers 0x21 and 0x25 match.

Register 0x27 is SWD_VOLTAGE_SETTING and SWD_POWER_GOOD_THRESHOLD_LOW_SIDE_VOLTAGE_SETTING and your installed stick value differs from the default value, however the value on the removed sticks matches the default value of 0x78. On my sticks this value is also 0x78.

Also one of your removed sticks has register 0x2F value of 0x5E, whereas all other sticks have a value of 0x06.

The value of 0x5E when compared to 0x06 has the additional following flags set:

Thats all I can say for now.

Unfortunately these registers are located within the password protected area. It is possible to edit them, but currently neither GUI nor CLI version support writing and editing data outside of EEPROM address range (80-87).

It is possible to edit those registers using third party serial port software, like serial port monitor (paid program, free trial is available), however you still need to know the password, and I can't guarantee the DIMMs will work even if you edit those registers to match the values on your working sticks.

Edit: corrected register name

1a2m3 commented 3 months ago

Clarification: values at registers 0x21, 0x25, and 0x27 are configured at startup by registers 0x45, 0x49, and 0x4B, respectively. But those are still within the protected area and without correct password values read from those registers appear as 0's.

JakeMartin99 commented 3 months ago

Hmm, well my intuition would be that it's the 5e one, in that case. But regardless, if it's password protected, is there any way I'd even be able to figure out the password? Seems like that might be an insurmountable roadblock...