beagleboard / beaglebone-ai-64

Mirror of https://git.beagleboard.org/beagleboard/beaglebone-ai-64
https://beaglebone.ai/64
Creative Commons Attribution 4.0 International
6 stars 0 forks source link

Possible Device Manager (TISCI server) crash around GPU startup #44

Closed nmenon closed 11 months ago

nmenon commented 2 years ago

SN: 202208 000230 a.txt

         Starting Hold until boot process finishes up...
[   22.498073] Mass Storage Function, version: 2009/09/11
[   22.503275] LUN: removable file: (no medium)
[   22.511863] ti-sci 44083000.dmsc: Mbox timedout in resp(caller: ti_sci_cmd_get_device_exclusive+0x18/0x2c)
[   22.521549] ti-sci 44083000.dmsc: Mbox send fail -110
[   22.526650] pvrsrvkm 4e20000000.gpu: adding gpu_0 device link failed!

ERROR:   Unhandled External Abort received on 0x80000001 from S-EL1
ERROR:   exception reason=0 syndrome=0xbf000000
Unhandled Exception from EL1
x0             = 0xffff800014a80000
x1             = 0x0000000000000000
x2             = 0x0000000000000000
x3             = 0xffff800009352018
x4             = 0x0000000000000000
x5             = 0x0000000000000000
x6             = 0xffff8000130f35e7

Note:

[   22.511863] ti-sci 44083000.dmsc: Mbox timedout in resp(caller: ti_sci_cmd_get_device_exclusive+0x18/0x2c)
[   22.521549] ti-sci 44083000.dmsc: Mbox send fail -110

Seems to indicate DM has gone non-operational for some reason.. in the get_device call

[   22.526650] pvrsrvkm 4e20000000.gpu: adding gpu_0 device link failed!

Seems to indicate this is around the GPU - power domain failed to attach for some reason.

Fail frequency seen 1 in 1000 cold boots at this point.