MrChromebox / firmware

Issue tracker for firmware issues
75 stars 14 forks source link

ASUS Chromebox 3 (CN65) Crashes under high CPU load #242

Open lanrat opened 3 years ago

lanrat commented 3 years ago

I'm not sure if this is the right place to file this issue, or even if its Firmware related, so feel free to close if this is not the correct place.

Device: ASUS Chromebox 3 (CN65) Fw Ver: MrChromebox-4.12 (06/04/2020)

I'm running Debian Linux, and whenever I run any process with high CPU load the CN65 instantly locks up and is entirely unresponsive, needing a hard reboot to come back to life. I've had this happen with multiple different processes, all that push the CPU. Unfortunately as it happens the instant the CPU load gets to high I'm unable to see anything in the system logs, and am currently at a loss on how to debug.

I'm using the stock 90W power supply.

MrChromebox commented 3 years ago

which CPU? any way to reliably reproduce? If so, I'd try booting a live USB and replicating there to rule out an issue with your install/kernel etc

lanrat commented 3 years ago

The CPU is a Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz.

In the past this has happened whenever I run aircrack-ng, hashcat, or gunzip. I just tried running the programs again and it appears to not crash. I'll run some more tests and report back.

I did notice that when doing these tests the CPU gets throttled due to the CPU temperature being above the threshold, which I guess is normal for this type of programs.

MrChromebox commented 3 years ago

I can provide a test firmware which doesn't set the TDP values as high (ie, back to stock power levels)

lanrat commented 3 years ago

How much lower are they set in your firmware vs. stock?

I'm using the OEM cooling which I'm assuming is not the best.

MrChromebox commented 3 years ago

stock is 15W/28W for PL1/2, current UEFI is 28W/51W

lanrat commented 3 years ago

Would the TDP values cause the entire system to freeze?

I would assume they would just down-clock the CPU and not cause a lock-up.

MrChromebox commented 3 years ago

hard to say, but I'm not able to reproduce here on a Celeron box

dixonalistair85 commented 3 years ago

Hi - I have a similar issue with my ASUS Chromebox 3 (CN65) - any chance of the test firmware with lower TDP values to see if that helps?

egrath commented 2 years ago

I second this, my CN65 has similar issues when running it with full load. Any way of getting a modified Firmware with 15 Watts of TDP? (or a short guide on how to build it myself?)

MrChromebox commented 2 years ago

right now it's set to 28W/51W, I'll reduce to 20/40 for the next release.

Are people seeing issues using a 95W power brick, or something smaller?

egrath commented 2 years ago

Mine's one with a i7-8550U CPU and a 90 W rated PSU.

Interestingly, according to the Spec, the maximum configurable TDP for this CPU should not exceed 25 W. https://ark.intel.com/content/www/de/de/ark/products/122589/intel-core-i7-8550u-processor-8m-cache-up-to-4-00-ghz.html

CageOff commented 2 years ago

Hello everyone! Can someone email me a BIOS DUMP from CN65 i7-8550u? My BIOS is damaged. I want to bring him back to life. Thanks! CageOff@mail.ru

MrChromebox commented 2 years ago

@CageOff please do not hijack this issue. Also, see https://wiki.mrchromebox.tech/Unbrcking for important info on directly flashing Chromeboxes.

dgranz commented 2 years ago

Has this been solved? I have 3 Chromebox3/i7-8550U CPU machines that are having the same stability issues when under load. They are powered by the 90W Asus powers supplies that came with them.

lanrat commented 2 years ago

I could never get my initial CN65 (Chromebox3) working under high load without crashing. However I've since had good luck on some other units that have been stable for a few months. So it might be hardware related? Maybe a slightly different revision that causes problems with this firmware?

bam80 commented 2 years ago

Do we even know it's firmware related? Could we try to reproduce it with original fw on ChromeOS?

lanrat commented 2 years ago

I never had any issues with my units when I was running ChromeOS. But I also never used them under as high of a load, or for very long.

egrath commented 2 years ago

I could never get my initial CN65 (Chromebox3) working under high load without crashing. However I've since had good luck on some other units that have been stable for a few months. So it might be hardware related? Maybe a slightly different revision that causes problems with this firmware?

The original Firmware sets the CPU TDP Limit to 15 W so it's always inside a very safe and conservative margin compared to the 28 W set by MrChromebox's Firmware - Intel recommends a maximum TDP of 25 W to be set by system integrators. IMHO the CN65 cooling system simply can't handle the thermal exhaust when running at 28 W.

bam80 commented 2 years ago

On ChromeOS, we could try to run the same power hungry utils in stock linux container

bam80 commented 2 years ago

right now it's set to 28W/51W, I'll reduce to 20/40 for the next release.

Has the TDP limit been decreased? If so, could you build test fw with 28W/51W please?

Started to use my Chromebox for real things and faced the opposite problem: under high load (compiling), CPU clock is maxed to 1.8GHz despite "Frequency Boost" setting with Ondemand governor. CPU temp is about 60-64C all this time, with 36C when idle. When relatively idle, short spikes in load can rise the clock up to 3GHz and higher, as intended.

So I would really love to have the clock rise more under high load, even in cost of several C degrees.

MrChromebox commented 2 years ago

Has the TDP limit been decreased? If so, could you build test fw with 28W/51W please?

it's 20/40 as I mentioned above. I never had any issues with 28/51 so that's what I use myself

bam80 commented 2 years ago

@MrChromebox nice, could you share 28/51 fw then so I could check if it helps with my issue above?

MrChromebox commented 2 years ago

there's no way that 20W PL1 is causing throttling when idle.

bam80 commented 2 years ago

Started to use my Chromebox for real things and faced the opposite problem: under high load (compiling), CPU clock is maxed to 1.8GHz despite "Frequency Boost" setting with Ondemand governor. CPU temp is about 60-64C all this time, with 36C when idle.

I tried cold boot (from unpowered state) and it fixed the problem for me. Usual reboot didn't fix it. I didn't try to shut down/switch on system while connected to power. So it still might be something with EC I think. I'll report it separately when I reproduce it again.

klam2003 commented 3 months ago

I know this is an old post, but I'm still having this exact issue in 2024. I'm using an Asus Chromebox 3 i7-8550U to run a Minecraft server and it crashes when power usage goes above ~25W. Has anyone found a solution to this issue? Cold booting didn't change anything for me.

I'm creating a Github account just to comment on this, so let me know if there are any additional logs I need to upload to help resolve this issue.

bam80 commented 3 months ago

@MrChromebox could it be helpful to investigate the problem with self-made Suzy-Q cable? If so I could share my experience of making one from just a pair of resistors and USB-C breakout board.

aiac commented 1 month ago

My cn65 on Windows 11 shutting down itself often due to "thermal event". It gets really hot when unzipping 5GB archive full of small files, CPU hits 95 C, nvme disk hits 80 C. Unfortunately, unexpected shutdown also happens when I'm not using the disk - it's just the CPU load. I found couple of reddit posts referring to the same problem.

i7-8550u with 90w power supply

bam80 commented 1 month ago

I found couple of reddit posts referring to the same problem.

@aiac Could you post the links here for the reference?

aiac commented 3 weeks ago

I found couple of reddit posts referring to the same problem.

@aiac Could you post the links here for the reference?

Now I can't find every instance of comments where someone has complained about this issue, but here are two examples: https://www.reddit.com/r/chrultrabook/comments/spl990/issues_when_running_windows_10_on_cn65 https://www.reddit.com/r/chrultrabook/comments/hwc5ng/experiences_from_an_asus_chromebox_3_cn65

I would be grateful for a new firmware with original PLs, I probably won't be able to compile it myself.

The ability to configure these limits in software would be best because we could test the settings on specific devices.

MrChromebox commented 3 weeks ago

The ability to configure these limits in software would be best because we could test the settings on specific devices.

once the necessary framework exists under coreboot and edk2 I'll be happy to do that, but for now recompilation is the only way. The PLs in the latest release (4.22.5) are very close to stock and should not be problematic

aiac commented 3 weeks ago

once the necessary framework exists under coreboot and edk2 I'll be happy to do that, but for now recompilation is the only way. The PLs in the latest release (4.22.5) are very close to stock and should not be problematic

Is this version already available for updating from your script? I couldn't find a changelog for it anywhere. If there is a way to install it, I will be happy to test it. Thanks very much for your work!

MrChromebox commented 3 weeks ago

Is this version already available for updating from your script? I couldn't find a changelog for it anywhere. If there is a way to install it, I will be happy to test it. Thanks very much for your work!

yes, it is - running the script tells you if there's an update available

blackjid commented 3 weeks ago

I can't find any reference to 4.22.5, in the scripts repository the latest commit that makes reference to a version is 4.22.4. The script only upgrades to 4.22.4

Downloading Full ROM firmware
(coreboot_edk2-fizz-mrchromebox_20240416.rom)
**   Device: Asus Chromebox 3 / CN65 (TEEMO)
** Platform: Intel KabyLake
**  Fw Type: Full ROM / UEFI (pending reboot)
**   Fw Ver: MrChromebox-4.22.4 (04/16/2024)
**    Fw WP: Disabled
MrChromebox commented 3 weeks ago

typo, I meant 4.22.4

aiac commented 3 weeks ago

typo, I meant 4.22.4

I have had this version for some time and the problem remains. the processor exceeds 90 C, and a moment later Windows turns off due to a Thermal Event. Only if I turn on ThrottleStop -> Shift EPP / PL1 = 15, PL2 = 30 does this stop happening and the temperature does not exceed 90 C.

MrChromebox commented 3 weeks ago

what's the fan speed? wonder if it's not ramping up enough. The next release will have an EC update for Fizz and I'm running the current PLs (20/40) with Prime95, 4k video, etc and temps don't get above about 65*C

aiac commented 3 weeks ago

what's the fan speed? wonder if it's not ramping up enough. The next release will have an EC update for Fizz and I'm running the current PLs (20/40) with Prime95, 4k video, etc and temps don't get above about 65*C

Where can I check fan's RPM? HwInfo doesnt show them. Based on my ears, I can assume that the fan is running as fast as it can :)

Try running Prime95 and CrystalDiskMark at the same time. In my case, with PL20/40, the computer turns off after simpler tasks, such as reading and analyzing songs from the disk in a DJ program (NI Traktor).

MrChromebox commented 3 weeks ago

Where can I check fan's RPM? HwInfo doesnt show them. Based on my ears, I can assume that the fan is running as fast as it can :)

try using: https://github.com/death7654/Chrultrabook-Tools

Try running Prime95 and CrystalDiskMark at the same time

ok, they're running for 10 mins without issue so far