cosmoss-jigu / memtis

Tiered memory management
60 stars 18 forks source link

BUG: soft lockup - CPU#64 stuck for 187s! [rg:4628] #2

Open luckyq opened 1 year ago

luckyq commented 1 year ago

Hi,

I met a problem after I installed the memtsi. After I booted the system and configured the persistent memory, a bug happened.

`Message from syslogd@optane03 at Nov 1 20:48:52 ... kernel:[ 1573.073360] watchdog: BUG: soft lockup - CPU#14 stuck for 250s! [rg:4625]

Message from syslogd@optane03 at Nov 1 20:48:52 ... kernel:[ 1573.489358] watchdog: BUG: soft lockup - CPU#56 stuck for 250s! [rg:4630]

Message from syslogd@optane03 at Nov 1 20:48:52 ... kernel:[ 1573.509359] watchdog: BUG: soft lockup - CPU#62 stuck for 250s! [rg:4617]

Message from syslogd@optane03 at Nov 1 20:48:52 ... kernel:[ 1573.517359] watchdog: BUG: soft lockup - CPU#64 stuck for 250s! [rg:4628]

Message from syslogd@optane03 at Nov 1 20:48:52 ... kernel:[ 1573.529358] watchdog: BUG: soft lockup - CPU#67 stuck for 250s! [rg:4626]

Message from syslogd@optane03 at Nov 1 20:48:52 ... kernel:[ 1573.537358] watchdog: BUG: soft lockup - CPU#70 stuck for 250s! [rg:4621]

Message from syslogd@optane03 at Nov 1 20:49:04 ... kernel:[ 1585.297310] watchdog: BUG: soft lockup - CPU#30 stuck for 261s! [migration/30:195]

Message from syslogd@optane03 at Nov 1 20:49:04 ... kernel:[ 1585.457309] watchdog: BUG: soft lockup - CPU#47 stuck for 261s! [migration/47:297]

Message from syslogd@optane03 at Nov 1 20:49:16 ... kernel:[ 1597.305260] watchdog: BUG: soft lockup - CPU#31 stuck for 257s! [migration/31:201] `

It keeps reporting this to the terminal.

luckyq commented 1 year ago
Screenshot 2023-11-01 at 20 51 02

This is the dmesg report.

Tmichailidis commented 9 months ago

I run into the exact same problem. Is there a fix for this? @luckyq @multics69 @skmonga @madhavakrishnan @taehyung-lee

m8 commented 9 months ago

I'm also getting the same error.

luckyq commented 7 months ago

I run into the exact same problem. Is there a fix for this? @luckyq @multics69 @skmonga @madhavakrishnan @taehyung-lee Emmm, don't use remote-ssh in vscode or other extensions... Make sure to kill all other processes.

DanielLee343 commented 3 months ago

I also experienced this bug when setting DRAM size to some specific size ranges. Worst thing is, the kernel keeps spinning that somehow rejected ssh (user space procs) to be launched. Only rebooting would temporarily fix this. Also see #9

taehyung-lee commented 2 months ago

Do you guys still suffer from this problem? In fact, I don't reproduce the same thing. It would be helpful to denote the detailed explanation for server/benchmark environment.

And, I'm rarely visiting the github. So, please tell me problems to my email.

taehyung.tlee@gmail.com

Thanks.