corna / me_cleaner

Tool for partial deblobbing of Intel ME/TXE firmware images
GNU General Public License v3.0
4.5k stars 278 forks source link

MSI X99S MPower Blank Screen then Boot Loops #17

Closed AAccount closed 7 years ago

AAccount commented 7 years ago

I tried this on my msi x99s mpower motherboard described here: https://ca.msi.com/Motherboard/X99S-MPOWER.html#hero-overview. Specifically I tried the newest M.B revision of bios. After flashing, my computer rebooted to a black screen. It looks like the video card did not initialize at all. I waited for 5 minutes but nothing happened. After, I force restarted the computer and it went into a boot loop. Tries to turn on, turns off turns on again. Luckily the motherboard has 2 bios chips so I used the secondary bios to flash the primary one with an unmodified M.B revision and everything is ok now. Thank you for your hard work in this project. I suppose I might be willing to try it again since recovering a bricked bios is pretty easy for this motherboard.

Here's the output when I run the script:

[Daniel@Daniel8 me_clean]$ python3 me_cleaner.py E7885IMS.MB0 Full image detected The ME region goes from 0x1000 to 0x7fffff Found FPT header at 0x1010 Found 20 partition(s) ME firmware version 9.1.10.1000 Found FTPR header: FTPR partition spans from 0x48000 to 0xd0000 Removing extra partitions... Removing extra partition entries in FPT... Removing EFFS presence flag... Correcting checksum (0xea)... Reading FTPR modules list... Wiping LZMA section (0xadbb4 - 0xd0000) UPDATE (LZMA, 0x0adbb4 - 0x0addde): removed ROMP (Huffman, 0x04eac0 - 0x04eec9): NOT removed, essential BUP (Huffman, 0x04eec9 - 0x05fd1f): NOT removed, essential KERNEL (Huffman, 0x05fd1f - 0x095093): removed POLICY (Huffman, 0x095093 - 0x0adbb4): removed ClsPriv (LZMA, 0x0addde - 0x0ae1b7): removed SESSMGR (LZMA, 0x0ae1b7 - 0x0b9b51): removed SESSMGR_PRIV (LZMA, 0x0b9b51 - 0x0bf430): removed HOSTCOMM (LZMA, 0x0bf430 - 0x0c773a): removed TDT (LZMA, 0x0c773a - 0x0ccaef): removed FPF (LZMA, 0x0ccaef - 0x0ce5f2): removed Done! Good luck!

platomav commented 7 years ago

From this thread, download ME System Tools v9.1 and run MEInfo tool at a command prompt. Attach the output here to see if BootGuard is enabled.

AAccount commented 7 years ago

C:\Users\A\Intel ME System Tools v9.1 r1\MEInfo\Windows64>MEInfoWin64.exe

Intel(R) MEInfo Version: 9.1.20.1020 Copyright(C) 2005 - 2014, Intel Corporation. All rights reserved.

GBE Region does not exist. Intel(R) ME code versions:

BIOS Version: M.B0 MEBx Version: 0.0.0.0000 Gbe Version: Unknown VendorID: 8086 PCH Version: 5 FW Version: 9.1.37.1002 H LMS Version: Not Available MEI Driver Version: 10.0.30.1054 Wireless Hardware Version: 2.1.77 Wireless Driver Version: 16.5.3.6

FW Capabilities: 0x40100940

Intel(R) Capability Licensing Service - PRESENT/ENABLED
Intel(R) Dynamic Application Loader - PRESENT/ENABLED
Service Advertisement & Discovery - PRESENT/ENABLED

TLS: Disabled Last ME reset reason: Power up Local FWUpdate: Enabled BIOS Config Lock: Disabled Host Read Access to ME: Enabled Host Write Access to ME: Enabled SPI Flash ID #1: EF4018 SPI Flash ID VSCC #1: 20252025 SPI Flash BIOS VSCC: 20252025 BIOS boot State: Post Boot OEM Id: 00000000-0000-0000-0000-000000000000 Capability Licensing Service: Enabled OEM Tag: 0x00000000 Localized Language: Unknown Independent Firmware Recovery: Enabled

C:\Users\A\Intel ME System Tools v9.1 r1\MEInfo\Windows64>

platomav commented 7 years ago

Can you show MEInfo -verbose? I don't see BG at the end where it should be. Maybe BG was not a thing back when X99 launched, I don't recall. Have you searched the BIOS for any security options such Secure Boot, Boot Guard etc which might cause such issues?

Also, from the System Tools, run Flash Programming Tool with command "fptw64 -f me_b310123.bin -me" with the file (ME region only) linked here. Once it is done, shut down the system, remove power (psu cord + psu switch to off + press power button 1-2 times) and wait for 1 minute. Now check if it boots.

AAccount commented 7 years ago

What's so special about that bin file? Is it neutered? Oh and just to double check, this will modify the bios stored in the bios chip and not stored somewhere else? Bad bios flash I can recover from, others not sure...

platomav commented 7 years ago

Yes it is but with an older me_cleaner commit (b310123) which removed all extra $FPT modules but leaves the Recovery one (FTPR) completely intact (no LZMA or Huffman FTPR modules removed). If it works then the problem is with newer me_cleaner versions as they probably remove something they shouldn't. If it doesn't work like before, then the issue you have is system specific (BootGuard, TPM, SecureBoot etc technologies). Thus the MEInfo -verbose output and BIOS options I asked for. :)

In case you are not familiar with Intel SPI chip image structure, it mainly consists of the regions Flash Descriptor (FD - controls read/write access to the other regions among other functionality), GbE, ME and BIOS. You already recovered from a bad ME region flash (what's what me_cleaner adjusts) so you can indeed reflash the entire SPI chip with whatever method you are using. So yes, you can recover from more than just a bad "bios" flash if that's your question but I suspect your, justified, inclination comes from the common misunderstanding that "BIOS" = "SPI image". The "BIOS" region is just a part of the SPI chip/image and "ME" is another.

AAccount commented 7 years ago

Thanks for the explanation. I'm not familiar with the layout. I just figured the dual bios chip setup on my MB would save my behind so there is no risk in trying (that's what the manufacturer desctiption calls it). During the bios update it says "flashing bios and me". Then it looks like the " bios update" procedure flashes the whole spi and the "dual bios" setup has 2 spi chips that can be swithed. I have classes until 8pm eastern time so I'll try it when I get back (and post the verbose). For this MB secure boot or other windows drm boot crap is disabled by default.

AAccount commented 7 years ago

Verbose MEI C:\Users\A\Intel ME System Tools v9.1 r1\MEInfo\Windows64>MEInfoWin64.exe -verbo se

Intel(R) MEInfo Version: 9.1.20.1020 Copyright(C) 2005 - 2014, Intel Corporation. All rights reserved.

FW Status Register1: 0x1E000255 FW Status Register2: 0x66002306 FW Status Register3: 0x00000200 FW Status Register4: 0x00004001 FW Status Register5: 0x00000000 FW Status Register6: 0x30000020

CurrentState: Normal ManufacturingMode: Enabled FlashPartition: Valid OperationalState: M0 with UMA InitComplete: Complete BUPLoadState: Success ErrorCode: No Error ModeOfOperation: Normal Phase: HOSTCOMM Module ICC: Valid OEM data, ICC programmed ME File System Corrupted: No

Get ME FWU version command...done

Windows OS Version : 6.2.9200 "" OS BIOS Support : UEFI

Table Type   0 ( 0x 00 ) found, size of  24 (0x 18 ) bytes

Windows OS Version : 6.2.9200 "" OS BIOS Support : UEFI

Table Type   0 ( 0x 00 ) found, size of  24 (0x 18 ) bytes
Table Type   1 ( 0x 01 ) found, size of  27 (0x 1B ) bytes
Table Type   2 ( 0x 02 ) found, size of  15 (0x 0F ) bytes
Table Type   3 ( 0x 03 ) found, size of  22 (0x 16 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   8 ( 0x 08 ) found, size of   9 (0x 09 ) bytes
Table Type   9 ( 0x 09 ) found, size of  17 (0x 11 ) bytes
Table Type   9 ( 0x 09 ) found, size of  17 (0x 11 ) bytes
Table Type   9 ( 0x 09 ) found, size of  17 (0x 11 ) bytes
Table Type   9 ( 0x 09 ) found, size of  17 (0x 11 ) bytes
Table Type   9 ( 0x 09 ) found, size of  17 (0x 11 ) bytes
Table Type   9 ( 0x 09 ) found, size of  17 (0x 11 ) bytes
Table Type   9 ( 0x 09 ) found, size of  17 (0x 11 ) bytes
Table Type  10 ( 0x 0A ) found, size of   6 (0x 06 ) bytes
Table Type  11 ( 0x 0B ) found, size of   5 (0x 05 ) bytes
Table Type  12 ( 0x 0C ) found, size of   5 (0x 05 ) bytes
Table Type  32 ( 0x 20 ) found, size of  20 (0x 14 ) bytes
Table Type  34 ( 0x 22 ) found, size of  11 (0x 0B ) bytes
Table Type  26 ( 0x 1A ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  28 ( 0x 1C ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  27 ( 0x 1B ) found, size of  15 (0x 0F ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  27 ( 0x 1B ) found, size of  15 (0x 0F ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  29 ( 0x 1D ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  34 ( 0x 22 ) found, size of  16 (0x 10 ) bytes
Table Type  26 ( 0x 1A ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  26 ( 0x 1A ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  28 ( 0x 1C ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  27 ( 0x 1B ) found, size of  15 (0x 0F ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  28 ( 0x 1C ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  27 ( 0x 1B ) found, size of  15 (0x 0F ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  29 ( 0x 1D ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  29 ( 0x 1D ) found, size of  22 (0x 16 ) bytes
Table Type  36 ( 0x 24 ) found, size of  16 (0x 10 ) bytes
Table Type  35 ( 0x 23 ) found, size of  11 (0x 0B ) bytes
Table Type  26 ( 0x 1A ) found, size of  22 (0x 16 ) bytes
Table Type  28 ( 0x 1C ) found, size of  22 (0x 16 ) bytes
Table Type  27 ( 0x 1B ) found, size of  15 (0x 0F ) bytes
Table Type  29 ( 0x 1D ) found, size of  22 (0x 16 ) bytes
Table Type  39 ( 0x 27 ) found, size of  22 (0x 16 ) bytes
Table Type  41 ( 0x 29 ) found, size of  11 (0x 0B ) bytes
Table Type  41 ( 0x 29 ) found, size of  11 (0x 0B ) bytes
Table Type  41 ( 0x 29 ) found, size of  11 (0x 0B ) bytes
Table Type  16 ( 0x 10 ) found, size of  23 (0x 17 ) bytes
Table Type  19 ( 0x 13 ) found, size of  31 (0x 1F ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type  20 ( 0x 14 ) found, size of  35 (0x 23 ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type  20 ( 0x 14 ) found, size of  35 (0x 23 ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type  17 ( 0x 11 ) found, size of  40 (0x 28 ) bytes
Table Type 143 ( 0x 8F ) found, size of  16 (0x 10 ) bytes
Table Type 144 ( 0x 90 ) found, size of   5 (0x 05 ) bytes
Table Type   7 ( 0x 07 ) found, size of  19 (0x 13 ) bytes
Table Type   7 ( 0x 07 ) found, size of  19 (0x 13 ) bytes
Table Type   7 ( 0x 07 ) found, size of  19 (0x 13 ) bytes
Table Type   4 ( 0x 04 ) found, size of  48 (0x 30 ) bytes
Table Type 130 ( 0x 82 ) found, size of  20 (0x 14 ) bytes
Table Type 131 ( 0x 83 ) found, size of  64 (0x 40 ) bytes
    MEBx Version found is 0.0.0.0000

Get ME FWU info command...done

Get ME FWU version command...done

Get ME FWU feature state command...done

Get ME FWU platform type command...done

Get ME FWU feature capability command...done

Get ME FWU OEM Id command...done FW Capabilities value is 0x40100940 Feature enablement is 0x40100940 Platform type is 0x463F0302 GBE Region does not exist. Intel(R) ME code versions:

BIOS Version: M.B0 MEBx Version: 0.0.0.0000 Gbe Version: Unknown VendorID: 8086 PCH Version: 5 FW Version: 9.1.37.1002 H LMS Version: Not Available MEI Driver Version: 10.0.30.1054 Wireless Hardware Version: 2.1.77 Wireless Driver Version: 16.5.3.6

FW Capabilities: 0x40100940

Intel(R) Active Management Technology - NOT PRESENT
Intel(R) Standard Manageability - NOT PRESENT
Intel(R) Anti-Theft Technology - NOT PRESENT
Intel(R) Capability Licensing Service - PRESENT/ENABLED
Protect Audio Video Path - NOT PRESENT
Intel(R) Dynamic Application Loader - PRESENT/ENABLED
Service Advertisement & Discovery - PRESENT/ENABLED
Intel(R) NFC Capabilities - NOT PRESENT
Intel(R) Platform Trust Technology - NOT PRESENT

TLS: Disabled Last ME reset reason: Power up Local FWUpdate: Enabled

Get BIOS flash lockdown status...done BIOS Config Lock: Disabled

Get flash master region access status...done Host Read Access to ME: Enabled Host Write Access to ME: Enabled SPI Flash ID #1: EF4018 SPI Flash ID VSCC #1: 20252025 SPI Flash BIOS VSCC: 20252025 Protected Range Register Base #0 0x0 Protected Range Register Limit #0 0x0 Protected Range Register Base #1 0x0 Protected Range Register Limit #1 0x0 Protected Range Register Base #2 0x0 Protected Range Register Limit #2 0x0 Protected Range Register Base #3 0x0 Protected Range Register Limit #3 0x0 Protected Range Register Base #4 0x0 Protected Range Register Limit #4 0x0 BIOS boot State: Post Boot OEM Id: 00000000-0000-0000-0000-000000000000 Capability Licensing Service: Enabled

Get ME FWU OEM Tag command...done OEM Tag: 0x00000000

Get ME FWU Platform Attribute (WLAN ucode) command...done Localized Language: Unknown

Get ME FWU Info command...done Independent Firmware Recovery: Enabled

C:\Users\A\Intel ME System Tools v9.1 r1\MEInfo\Windows64>

AAccount commented 7 years ago

Tried the new neutered ME file and flashed it exactly as told. Unfortunately got the same result (and the same heart attack).

platomav commented 7 years ago

MEInfo does not mention BootGuard and it seems to be disabled at the SPI image provided by MSI as well so it should be safe to assume it is not the problem. You said SecureBoot is disabled and I highly doubt there is a TPM module installed so the only logical conclusion is that this is a BIOS-specific issue. Meaning, something at the BIOS checks the ME and maybe tries to recover it, thus the boot loop? It's a guess. It would be interesting to see when this "check" takes place maybe by removing one "useless" section of the ME for starters and see if even that triggers the problem. But that would require a few more tests and possibly heart attacks.

AAccount commented 7 years ago

I don't have a TPM module. I saw a header for it in the motherboard manual but I didn't buy anything for it. That's a very interesting theory. Well if you want to remove a section I suppose I could try it again. Having recovered from 2 "boot loops", I think it's safe to say the dual bios/spi chip setup should be able to save my behind every time...???

platomav commented 7 years ago

If you can recover from a ME brick, you can recovery from "anything". Some OEMs implement BIOS region recovery methods which will work nicely with little user intervention but a ME brick always requires a SPI image reflash. So you're good on that regard, not that I endorse trying your luck constantly. I made a new ME region (same instructions as above, not full SPI image) which is intact with only one stupid module removed called MDMV which is definitely not required for booting or basic ME functionality. We'll see how the system will react.

AAccount commented 7 years ago

Your new image with mdmv removed which you say is not essential boots. Coming to you live after the flash. I guess to confirm it really is not essential lspci still shows MEI

[Daniel@Daniel8 ~]$ lspci ... 00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05) ...

platomav commented 7 years ago

Very interesting. So something that gets removed causes this. All we have to do is move backwards now. Remove one by one until it stops working. I created a set of 5 new ME regions only. Start from the smallest filename (less modules removed, most probable to work) and move towards the largest filename (most modules removed, less likely to work) until it stops working.

Previous test = MDMV removed 1st = previous - NFTP 2nd = 1st - GLUT 3rd = 2nd - (TMNN, NVUK, NVTD, NVSM, NVSH, NVNF, NVKR, NVJC, NVHM, NVCP, NVCL) --> All missing 4th = 3rd - (FCRS, MDES, FOVD, PSVN) --> All empty 5th = 4th - EFFS

Personally I suspect EFFS as the problem and that's why it's the last test. Maybe GLUT otherwise. We'll see.

AAccount commented 7 years ago

The very first file me_mdmv_mftp.bin failed. One thing I forgot to mention is that the debug led on the motherboard gets stuck at 68. If you download the manual for the motherboard and look at page 51, "68" seems to be mysteriously missing from the table.

platomav commented 7 years ago

Error code 68 is "PCI Host Bridge Initialization". This should be BIOS related. Have you set any custom options at the BIOS or overclocking? I doubt it as you reflash with the stock BIOS but it never hurts to check.

I have created two test ME regions here. These should be safe. File "me_mdmv_missing" lacks partitions which are mentioned at $FPT but cannot be found at flash (no starting address), the system should definitely boot with that. File "me_mdmv_missing_empty" additionally removes partitions which are mentioned at $FPT with starting address and size but are completely empty at flash. Again, I suspect this should work even at your system which behaves strangely. So flash these two test files and we'll go from there if they work. File "me_mdmv" is the working MDMV test which is kept for reference, don't flash that.

AAccount commented 7 years ago

The first one "me_mdmv_missing" worked but the second one did not (boot loops). If I understand correctly, the first one lists the partitions by name only but not where they are in the image. The code for those partitions is still there but is effectively unreachable since they're not mapped anywhere. The second one is what me_cleaner usually does which is keep the partition entry name and location but zero out its code. So is the moral of the story, don't address something that isn't usable?

platomav commented 7 years ago

The ME firmware has a Flash Partition Table (FPT) which is at the beginning. Each entry is 20 bytes and the first 10 bytes show the Name, Owner, Offset & Size of the given partition.

The "missing" partitions have only Size but without a starting offset they are nowhere to be found at the actual ME region flash. So they are just mentions, useless overall. They just take place at the FPT, their sizes are not even found somewhere as padding.

The "empty" partitions (first 4 in your case) have both Offset and Size so they take place at the ME region flash but it's just padding/empty. For example PSVN does start at 0xBC0 and it's filled with 0x40 padding bytes.

Corna's me_cleaner works differently. It just removes everything except the Recovery (FTPR) region from FPT and fills those removed with padding. This is acceptable for the ME co-processor itself and thus for most systems as proven here. However, in your case, I suspect MSI has implemented a check at the BIOS which possibly verifies that the ME is corrupted. I've seen similar tactics in the past from OEMs such as Gigabyte (BIOS GUID with just the first 0x400 bytes of the ME region or 0x400 EFFS/settings ME subsection), Clevo/Sager (Copy of first 0x100 FPT header bytes exactly before the ME starts, meaning in the FD or GbE space), ASRock (Check if ME version is different from hardcoded values at BIOS AMITSE module and if yes restore back via full BIOS GUID copy of ME region), ASUS (Exact copy of full ME region inside at BIOS GUID) and more...

Looking at the MSI SPI image, I couldn't find any obvious clue as to how that check is performed. Meaning, no BIOS GUID or modules with ME keywords which could be used for corruption checks. I believe that either:

1) The BIOS checks if certain ME modules are mentioned at the FPT, or 2) The BIOS refuses to boot if the ME reports some loading error

The 2nd seems a lot more likely. The first can be easily verified. I created a single ME region test image which has NFTP listed at FPT but the actual NFTP contents of the ME flash are gone/padded. If that works, then the BIOS checks the FPT. If not, the BIOS refuses to boot since the ME reported any error, even a non-critical one.

AAccount commented 7 years ago

Thanks once again for taking the time to go the extra mile in explaining things for a non expert in a clear easy to understand method. Unfortunately, the test image provided causes a boot loop. It looks like I'm SOL and just unknowingly bought a bad MB. Is the new moral of the story stay away from MSI or is this case by case thing?

platomav commented 7 years ago

No, not at all. The board that you got is not bad, the BIOS from MSI might have a bug somewhere as the boot loop does not seem proper whenever a ME loading error/corruption is encountered. Maybe they can fix that if reported. Something like "my ME got corrupted and ended up in a bootloop which does not seem normal", no me_cleaner mentions of course. MSI is actually pretty cool since they have two socketed SPI chips (even if both get corrupted, they can be easily remove and reflashed with 5$ programmers), their in-BIOS flasher reflashes both BIOS and ME (rare, cool and quite a butt-saver when the latter gets corrupted), they update the BIOS regurarly etc. So no, I don't believe you made the wrong choice or that there is a moral to this story. Such recovery methods can actually save systems and users who have no idea what ME even is and a lot of OEMs use different implementations of similar scope either way.

Anyway, these are beside the point. In the end, me_cleaner does not work on your board because the BIOS tries to recover from a corrupted ME. With that in mind, I think this issue can be closed now. :)

AAccount commented 7 years ago

I didn't mean the board being physically bad but the "design" (bios) is bad. This board's chips are not socketed. There are just 2 of them so 1 can cover for the other.

Just 1 little thing before closing: Obviously no me_cleaner mention but doesn't it seem a bit funny that I would know the ME flashing went bad. How would I have known that watching a 1%,2%,3%... bar. Make a guess because it "crapped out" at 75% (second half)? Or, how would an ME corruption happen spontaneously and I could "diagnose" that as the issue?

Oh and thanks again for all your help. I learned a lot from this. Who know bioses were so complicated now.

platomav commented 7 years ago

Yes, I confused your motherboard with another case I was trying to resolve. From the pictures I can indeed see two soldered SPI chips.

I'm not sure I understand the second part with the question. If you mean how to identify a broken ME, there are a lot of indicators as it's deeply integrated into Intel systems. Usual symptoms can be 30-minute shutdowns, fans spinning constantly at full speed, no power management, wrong/half RAM detection, iGPU not working, wrong clocks and no overclocking, BIOS reporting ME version as 0.0.0.0000 or N/A, AMT not working at Corporate/5MB SKUs, BIOS error messages related to ME during booting, bad performance, sleep/wakeup issues etc. Usually a google search will lead people to correct places (my example) to ask for help. There are also Intel tools which can check if the ME is working properly like MEInfo and MEManuf.

Thank you as well for indulging my (many) test files. You may not be able to get me_cleaner working (I'm not sure you would want that either way at such a nice/good board, hint) but at least we learnt of this MSI BIOS check that I wasn't aware off. Maybe Corna can add an extra warning regarding OEM ME recovery procedures which will try to reverse me_cleaner's actions.

corna commented 7 years ago

Thank you very much @platomav for the support you're putting into this project

  1. The BIOS checks if certain ME modules are mentioned at the FPT, or
  2. The BIOS refuses to boot if the ME reports some loading error

I think it's the second one. Even if the BIOS has access to the ME region (ifdtool -d E7885IMS.MB0, very uncommon, as Intel always suggests to deny the read/write access to the BIOS), I don't think that it checks the image directly. Instead I think it communicates to Intel ME using the standard MEI interface and, if it find something wrong, it hangs. We should search for the PCI ID of the MEI interface in the BIOS image.

As a reference, here the MSI BIOS checks the status of Intel ME but, if me_cleaner has been used, it just prints an error message.

AAccount commented 7 years ago

Here is the lspci -v info of the ME from linux

00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7885 Flags: bus master, fast devsel, latency 0, IRQ 37, NUMA node 0 Memory at 383ffff17000 (64-bit, non-prefetchable) [size=16] Capabilities: Kernel driver in use: mei_me Kernel modules: mei_me

Here is the windows device manager hardware ids reported: PCI\VEN_8086&DEV_8D3A&SUBSYS_78851462&REV_05 PCI\VEN_8086&DEV_8D3A&SUBSYS_78851462 PCI\VEN_8086&DEV_8D3A&CC_078000 PCI\VEN_8086&DEV_8D3A&CC_0780

Here is the windows device manager device instance path: PCI\VEN_8086&DEV_8D3A&SUBSYS_78851462&REV_05\3&11583659&0&B0