blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
16.03k stars 1.51k forks source link

[Support]: bug with Proxmox (takes down Proxmox entirely with a green screen) #3652

Closed habitats-tech closed 1 year ago

habitats-tech commented 1 year ago

Describe the problem you are having

I have been trying to get to the bottom of an issue where Proxmox randomly crashes. Following several days of troubleshooting I have come to the realisation if you run the Frigate HA add-on + Frigate HACS Integration + Frigate HA Integration on the same HAOS instance after a while the entire Proxmox server becomes unresponsive and only way out is physical power cycle the machine Proxmox is installed. I provide version numbers below, but to me it seems some kind of kernel panic type of error.

It seems I have overcome this issue by splitting Frigate on two HA instances. One instance runs MQTT and Frigate add-on, the other HA instance runs Frigate Proxy, Frigate HACS integration + card and Frigate HA integration. I have yet to try on a pure Debian installation, however, I am confident this will also work.

To me it seems a fatal conflict between MQTT, Frigate HA add-on + Frigate HACS Integration + Frigate HA integration running on the same HA instance, bringing the Proxmox server down (no error messages are logged in Proxmox or HAOS),

I am testing on a simple Frigate installation using just one camera stream.

Everything below is on latest production (non-beta) versions as of 15 Aug 2022.

Frigate System: HW: AMD HX5900 - allocated to HAOS/Frigate VM: 8-cores/8GB RAM/64GB SSD DSK

Latest HAOS with following add-ons (all latest production versions):

Proxmox 7.2-7 no-subscription, up to date with latest packages (15 Aug 2022). I have yet to test on Proxmox with subscription. HAOS 8.4

HACS

I have tried to troubleshoot and it seems MQTT interacting with Frigate HA add-on and Frigate HACS integration is the culprit. I am still testing as we speak, and will update as soon as I have valid input to provide. Having split the add-on from the integration seems to have resolved the issue (no problem for several hours), but the acid test will be tomorrow; if no crash then the issue is in the interaction between the three components mentioned earlier (MQTT, add-on, HACS).

Version

DEBUG 0.10.1-83481AF

Frigate config file

mqtt:
  host: haos-frigate.local
  port: 1883
  user: admin
  password: admin
  client_id: haos-frigate
  topic_prefix: haos-frigate
  stats_interval: 60

birdseye:
  enabled: True
  width: 1280
  height: 720
  quality: 8
  mode: continuous

cameras:
  living-room:
    ffmpeg:
      inputs:
        - path: rtsp://rtsp:xxxxxxx@192.168.0.72:554/av_stream/ch0
          roles:
            - detect
            - record
            - rtmp
    detect:
      width: 1920
      height: 1080

record:
  enabled: True
  expire_interval: 60
  retain:
    days: 0
    mode: all

  events:
    max_seconds: 240
    pre_capture: 4
    post_capture: 4
    objects:
      - person
    required_zones: []
    retain:
      default: 8
      mode: motion
      objects:
        person: 8

snapshots:
  enabled: True
  timestamp: True
  bounding_box: False
  crop: False
  height: 175
  required_zones: []
  retain:
    default: 8
    objects:
      person: 8

Relevant log output

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[2022-08-15 14:57:43] frigate.app                    INFO    : Starting Frigate (0.10.1-83481af)
[2022-08-15 14:57:43] frigate.app                    INFO    : Creating directory: /tmp/cache
Starting migrations
[2022-08-15 14:57:43] peewee_migrate                 INFO    : Starting migrations
There is nothing to migrate
[2022-08-15 14:57:43] peewee_migrate                 INFO    : There is nothing to migrate
[2022-08-15 14:57:43] frigate.app                    INFO    : Output process started: 217
[2022-08-15 14:57:43] frigate.app                    INFO    : Camera processor started for living-room: 220
[2022-08-15 14:57:43] frigate.app                    INFO    : Capture process started for living-room: 221
[2022-08-15 14:57:43] ws4py                          INFO    : Using epoll
[2022-08-15 14:57:43] ws4py                          INFO    : Using epoll
[2022-08-15 14:57:43] detector.cpu                   INFO    : Starting detection process: 216
[2022-08-15 14:57:43] frigate.edgetpu                WARNING : CPU detectors are not recommended and should only be used for testing or for trial purposes.
[2022-08-15 15:13:35] ws4py                          INFO    : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:51658]
[2022-08-15 15:14:53] ws4py                          INFO    : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:51658]
[2022-08-15 15:15:35] ws4py                          INFO    : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:58810]
[2022-08-15 15:15:46] ws4py                          INFO    : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:58810]
[2022-08-15 15:17:39] ws4py                          INFO    : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55776]
[2022-08-15 15:18:33] ws4py                          INFO    : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55776]

FFprobe output from your camera

N/A

Frigate stats

{
  "birdseye": {
    "enabled": true,
    "height": 720,
    "mode": "continuous",
    "quality": 8,
    "width": 1280
  },
  "cameras": {
    "living-room": {
      "best_image_timeout": 60,
      "detect": {
        "enabled": true,
        "fps": 5,
        "height": 1080,
        "max_disappeared": 25,
        "stationary": {
          "interval": 0,
          "max_frames": {
            "default": null,
            "objects": {}
          },
          "threshold": 50
        },
        "width": 1920
      },
      "ffmpeg": {
        "global_args": [
          "-hide_banner",
          "-loglevel",
          "warning"
        ],
        "hwaccel_args": [],
        "input_args": [
          "-avoid_negative_ts",
          "make_zero",
          "-fflags",
          "+genpts+discardcorrupt",
          "-rtsp_transport",
          "tcp",
          "-stimeout",
          "5000000",
          "-use_wallclock_as_timestamps",
          "1"
        ],
        "inputs": [
          {
            "global_args": [],
            "hwaccel_args": [],
            "input_args": [],
            "path": "rtsp://rtsp:xxxxxx@192.168.0.72:554/av_stream/ch0",
            "roles": [
              "record",
              "rtmp",
              "detect"
            ]
          }
        ],
        "output_args": {
          "detect": [
            "-f",
            "rawvideo",
            "-pix_fmt",
            "yuv420p"
          ],
          "record": [
            "-f",
            "segment",
            "-segment_time",
            "10",
            "-segment_format",
            "mp4",
            "-reset_timestamps",
            "1",
            "-strftime",
            "1",
            "-c",
            "copy",
            "-an"
          ],
          "rtmp": [
            "-c",
            "copy",
            "-f",
            "flv"
          ]
        }
      },
      "ffmpeg_cmds": [
        {
          "cmd": "ffmpeg -hide_banner -loglevel warning -avoid_negative_ts make_zero -fflags +genpts+discardcorrupt -rtsp_transport tcp -stimeout 5000000 -use_wallclock_as_timestamps 1 -i rtsp://rtsp:xxxxxxx@192.168.0.72:554/av_stream/ch0 -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c copy -an /tmp/cache/living-room-%Y%m%d%H%M%S.mp4 -c copy -f flv rtmp://127.0.0.1/live/living-room -r 5 -s 1920x1080 -f rawvideo -pix_fmt yuv420p pipe:",
          "roles": [
            "record",
            "rtmp",
            "detect"
          ]
        }
      ],
      "live": {
        "height": 720,
        "quality": 8
      },
      "motion": {
        "contour_area": 30,
        "delta_alpha": 0.2,
        "frame_alpha": 0.2,
        "frame_height": 50,
        "improve_contrast": false,
        "mask": "",
        "threshold": 25
      },
      "mqtt": {
        "bounding_box": true,
        "crop": true,
        "enabled": true,
        "height": 270,
        "quality": 70,
        "required_zones": [],
        "timestamp": true
      },
      "name": "living-room",
      "objects": {
        "filters": {
          "person": {
            "mask": null,
            "max_area": 24000000,
            "min_area": 0,
            "min_score": 0.5,
            "threshold": 0.7
          }
        },
        "mask": "",
        "track": [
          "person"
        ]
      },
      "record": {
        "enabled": true,
        "events": {
          "max_seconds": 240,
          "objects": [
            "person"
          ],
          "post_capture": 4,
          "pre_capture": 4,
          "required_zones": [],
          "retain": {
            "default": 8,
            "mode": "motion",
            "objects": {
              "person": 8
            }
          }
        },
        "expire_interval": 60,
        "retain": {
          "days": 0,
          "mode": "all"
        },
        "retain_days": null
      },
      "rtmp": {
        "enabled": true
      },
      "snapshots": {
        "bounding_box": false,
        "clean_copy": true,
        "crop": false,
        "enabled": true,
        "height": 175,
        "quality": 70,
        "required_zones": [],
        "retain": {
          "default": 8,
          "mode": "motion",
          "objects": {
            "person": 8
          }
        },
        "timestamp": true
      },
      "timestamp_style": {
        "color": {
          "blue": 255,
          "green": 255,
          "red": 255
        },
        "effect": null,
        "format": "%m/%d/%Y %H:%M:%S",
        "position": "tl",
        "thickness": 2
      },
      "zones": {}
    }
  },
  "database": {
    "path": "/media/frigate/frigate.db"
  },
  "detect": {
    "enabled": true,
    "fps": 5,
    "height": 720,
    "max_disappeared": null,
    "stationary": {
      "interval": 0,
      "max_frames": {
        "default": null,
        "objects": {}
      },
      "threshold": null
    },
    "width": 1280
  },
  "detectors": {
    "cpu": {
      "device": "usb",
      "num_threads": 3,
      "type": "cpu"
    }
  },
  "environment_vars": {},
  "ffmpeg": {
    "global_args": [
      "-hide_banner",
      "-loglevel",
      "warning"
    ],
    "hwaccel_args": [],
    "input_args": [
      "-avoid_negative_ts",
      "make_zero",
      "-fflags",
      "+genpts+discardcorrupt",
      "-rtsp_transport",
      "tcp",
      "-stimeout",
      "5000000",
      "-use_wallclock_as_timestamps",
      "1"
    ],
    "output_args": {
      "detect": [
        "-f",
        "rawvideo",
        "-pix_fmt",
        "yuv420p"
      ],
      "record": [
        "-f",
        "segment",
        "-segment_time",
        "10",
        "-segment_format",
        "mp4",
        "-reset_timestamps",
        "1",
        "-strftime",
        "1",
        "-c",
        "copy",
        "-an"
      ],
      "rtmp": [
        "-c",
        "copy",
        "-f",
        "flv"
      ]
    }
  },
  "live": {
    "height": 720,
    "quality": 8
  },
  "logger": {
    "default": "info",
    "logs": {}
  },
  "model": {
    "height": 320,
    "labelmap": {},
    "labelmap_path": null,
    "path": null,
    "width": 320
  },
  "motion": null,
  "mqtt": {
    "client_id": "haos-frigate",
    "host": "haos-frigate.local",
    "password": "xxxxxxxx",
    "port": 1883,
    "stats_interval": 60,
    "tls_ca_certs": null,
    "tls_client_cert": null,
    "tls_client_key": null,
    "tls_insecure": null,
    "topic_prefix": "haos-frigate",
    "user": "admin"
  },
  "objects": {
    "filters": null,
    "mask": "",
    "track": [
      "person"
    ]
  },
  "record": {
    "enabled": true,
    "events": {
      "max_seconds": 240,
      "objects": [
        "person"
      ],
      "post_capture": 4,
      "pre_capture": 4,
      "required_zones": [],
      "retain": {
        "default": 8,
        "mode": "motion",
        "objects": {
          "person": 8
        }
      }
    },
    "expire_interval": 60,
    "retain": {
      "days": 0,
      "mode": "all"
    },
    "retain_days": null
  },
  "rtmp": {
    "enabled": true
  },
  "snapshots": {
    "bounding_box": false,
    "clean_copy": true,
    "crop": false,
    "enabled": true,
    "height": 175,
    "quality": 70,
    "required_zones": [],
    "retain": {
      "default": 8,
      "mode": "motion",
      "objects": {
        "person": 8
      }
    },
    "timestamp": true
  },
  "timestamp_style": {
    "color": {
      "blue": 255,
      "green": 255,
      "red": 255
    },
    "effect": null,
    "format": "%m/%d/%Y %H:%M:%S",
    "position": "tl",
    "thickness": 2
  }
}

Operating system

Proxmox

Install method

HassOS Addon

Coral version

CPU (no coral)

Network connection

Wired

Camera make and model

Sonoff

Any other information that may be helpful

All logs Proxmox, HA, Frigate are clean. No errors.

image

NickM-27 commented 1 year ago

We have many users running on proxmox and in the same instance, without reporting any issues like this. There'a also nothing special with the way Frigate or the integration interacts with MQTT, do you have any other services interacting with MQTT at the same time?

In any case, without logs it will be quite difficult to know which component actually has an issue or how to begin trying to solve it. If other proxmox users have any input that would be helpful as well 👍

habitats-tech commented 1 year ago

Thanks for the quick response. Issue is there is nothing in the logs, except normal startup messages. Letting the VM sit idle (no activity) will eventually crash Proxmox. I will ty to dig deeper and update since there are no other reported issues. Proxmox/Linux kernel does not report any errors either.

habitats-tech commented 1 year ago

MQTT dedicated to Frigate. I have created a new VM with only MQTT and Frigate installed. Will test throughout this week and will hopefully identify the culprit.

habitats-tech commented 1 year ago

I have been able to get closer to the issue. The issue is between MQTT broker (Mosquitto) and Frigate add-on. How I know this:

When this VM is active every about 3 hours it will freeze the entire Proxmox system (I need to power cycle to get the system up). Nothing is recorded in the Proxmox, HA or Frigate logs which could point to the cause of the catastrophic failure.

The Proxmox system is on ZFS and the ZFS pool is clean.

There are two things I have identified. Any HAOS VM which runs both Mosquitto and Frigate add-on freezes every about 3 hrs.

Can you come up with any pointers why such behaviour. The HW is based on an AMD HX5900 and this is the only issue I have ever encountered on this test system.

I am carrying on with digging deeper, but any pointers are welcome.

habitats-tech commented 1 year ago

I confirm deleting the MQTT broker (Mosquitto) from the HAOS instance where the Frigate add-on is installed fixes the issue (no Proxmox freezes). I will carry on testing and update.

I now have (working with no issues for the last 3.5 hrs):

  1. a HAOS 8.5 VM with just Frigate add-on
  2. a HAOS 8.5 VM fully loaded with everything else (Mosquitto, all HACS Frigate, plus another 36 integrations)

Any attempt to have Mosquitto and Frigate add-ons on the same HAOS instance takes Proxmox down after about 3 hrs.

NickM-27 commented 1 year ago

I have no idea how that would happen or what would lead to that, I'd be curious if it happens with frigate, mosquito just running in docker or a Debian VM with docker.

Like I said previously lots of proxmox and also HA OS users and haven't heard of this before, without any information it's entirely guessing why such a thing would happen.

habitats-tech commented 1 year ago

I have created a Debian LXC and run MQTT/Mosquitto and Docker/Frigate; system/Proxmox crashes. I am going to long term test another scenario with two LXCs one running MQTT/Mosquitto and the other Docker/Frigate, which seemed promising (was running for 4 hours with no issues) until I created the combined one which within half hour system crashed. Not certain which of the two or possibly both are the culprit, so I need to test one at a time. Will update as soon as I have further info.

habitats-tech commented 1 year ago

Any installation method of Mosquitto & Frigate results in eventual Proxmox crash. I think it is a system specific issue, which logs do not seem to capture.

NickM-27 commented 1 year ago

Any installation method of Mosquitto & Frigate results in eventual Proxmox crash. I think it is a system specific issue, which logs do not seem to capture.

Yeah, we have lots of proxmox users and this is the first time this has been reported, seems there must be some other factor that is unique

habitats-tech commented 1 year ago

Just providing an update. There is no issue with Mosquitto per se. The Proxmox freeze arises when Frigate is talking to Mosquitto. It does not matter if the two are running on the same CT/VM or a different CT/VM. I am now testing what happens if the two are on a different physical machine.

habitats-tech commented 1 year ago

I had a Debian Frigate CT talking to a Windows Mosquitto broker on two different machines. Frigate crashed the Proxmox node it was on. So I am now confident the issue is with Frigate. What causes such behaviour is unknown, but I am now working with others who experience similar problems with Intel NUCs and other software.

I wonder if any Proxmox Frigate installation is running under an AMD platform.

habitats-tech commented 1 year ago

I have now completed all testing on AMD 4900H architecture. Proxmox freeze is a certainty only when Frigate (running under Debian 11 CT/VM or HAOS 8.4/2022.8+), is communicating to any Mosquitto MQQT broker (Debian 11 CT/VM, Windows 10/11, HAOS 8.4/2022.8+).

Frigate can coexist with the Mosquitto MQTT broker in the same OS instance, assuming it is not trying to communicate with the broker.

I guess something is causing a Kernel panic which cannot even be captured through the logs. Not certain if you can check the code for some kind of memory leak that builds over time, although sometimes (infrequently) the Proxmox freeze happens as soon as Frigate tries to communicate with a broker. Usually the freeze takes place a couple or hours following the start of communication between Frigate and Mosquitto.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

habitats-tech commented 1 year ago

Issue persists (Sep 2022) but unfortunately no solution found so far. The same freezing issue is experienced with any NVR running under Proxmox either as KVM or LXC. Tested NVRs are: AgentDVR Frigate ZoneMinder

1990marco1990 commented 1 year ago

The problem seems familiar to me, but unfortunately I haven't been able to find a solution yet. At least now I know what the problem is.

I think the problem even exists on the Rasperry Pi because I recently switched from Raspberry to an Intel NUC and my home assistant froze at irregular intervals so that only a restart using the power button helped.

The problems were worst when everything was installed directly in Homeassistant on both the Raspberry and the NUC. Now I'm running Frigate in an LXC container and the problem only occurs every about 2 days before it only worked for a few hours.

If I were to look for a system then I would say that it often happens when a person is detected in a camera and reported via MQTT.

maxfield-allison commented 11 months ago

I seem to be encountering the same behavior running a linux mint VM on an AMD 2920x with GPU passthrough. it had been working relatively well for awhile but recently, as soon as docker containers started, the entire proxmox host becomes 100% unresponsive. pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-6-pve)

I have been running with GPU passthrough with hardware accel versions of various containers so I was only able to stop and remove the frigate container once the gpu was removed from the VM config. now that they have been removed and wont start on boot I am adding the gpu back into the mix to see if that affects anything. Such a strange situation. I'll update if I find any new information.

maxfield-allison commented 11 months ago

looks like adding the GPU back in immediately causes a hard crash of the proxmox host. at this point I've got a new PSU inbound just to completely rule that possibility out. It had been running fine for several weeks so bad PSU is the only hting I can think of that might cause this sudden change in behavior

maxfield-allison commented 11 months ago

I've attached the same GPU to another fresh linux mint vm with no hard crash now. I was also able to determine that the original VM even with frigate and wyze bridge turned off now crashes the system right after uefi boot.

ripcdoc commented 5 months ago

Any update on this? I am experiencing the same issuer under almost identical circumstances - Debian CT (although 12, not 11), Proxmox, HAOS VM, Frigate (integration and add-on), MQTT, and a ZFS pool.

danhusan commented 4 months ago

Seems I am experiencing the same. Migrated a HA install from bare-metal to proxmox. It has since completely frozen proxmox or outputted random kernel dumps in syslog. I´ve tried just about every mitigation from microcode ,acpi, cstates, i915 gpu tweaks to disabling EEE before finding this thread. Uninstalling frigate from HA makes the issue disappear.

Proxmox, ZFS (raid1), HA with MQTT and Frigate on an Intel N100 mini-pc. No USB or GPU passthrough.

Edit: also did a 24hour memtest run, no errors reported.

Examples of crashes with dumps: `Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.560605] BUG: unable to handle page fault for address: ffffffffbff8a2a0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.567515] #PF: supervisor instruction fetch in kernel mode Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.573198] #PF: error_code(0x0010) - not-present page Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.578355] PGD 2be039067 P4D 2be039067 PUD 2be03a063 PMD 0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.584032] Oops: 0010 [#1] PREEMPT SMP NOPTI Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.588411] CPU: 2 PID: 1859 Comm: vhost-1835 Tainted: P U O 6.5.11-8-pve #1 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.596687] Hardware name: Default string Default string/Default string, BIOS 5.27 09/28/2023 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.605231] RIP: 0010:0xffffffffbff8a2a0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.609217] Code: Unable to access opcode bytes at 0xffffffffbff8a276. Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.615757] RSP: 0018:ffffab5b47257cf0 EFLAGS: 00010282 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.621006] RAX: ffff972e946394b0 RBX: ffff972e946300c0 RCX: 0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.628154] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff972e946394b0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.635307] RBP: ffffab5b47257e68 R08: 0000000000000000 R09: 0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.642461] R10: 0000000000000000 R11: 0000000000000000 R12: ffff972e946300d0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.649608] R13: 0000000000000000 R14: ffff972e94630000 R15: 0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.656762] FS: 00007fef9815c4c0(0000) GS:ffff97359fb00000(0000) knlGS:0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.664869] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.670641] CR2: ffffffffbff8a276 CR3: 000000011fa98000 CR4: 0000000000752ee0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.677789] PKRU: 55555554 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.680521] Call Trace: Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.682993] Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.685118] ? show_regs+0x6d/0x80 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.688559] ? die+0x24/0x80 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.691639] ? page_fault_oops+0x176/0x500 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.695763] ? kernelmode_fixup_or_oops+0xb2/0x140 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.700576] ? bad_area_nosemaphore+0x1a5/0x280 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.730984] handle_rx_net+0x15/0x20 [vhost_net] Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.735629] vhost_worker+0x46/0x80 [vhost] Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.705302] ? bad_area_nosemaphore+0x16/0x30 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.709679] ? do_kern_addr_fault+0x7b/0xa0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.713884] ? exc_page_fault+0x10d/0x1b0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.717914] ? asm_exc_page_fault+0x27/0x30 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.722126] ? handle_rx+0x185/0xbe0 [vhost_net] Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.726776] ? raw_spin_rq_unlock+0x10/0x40 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.739848] vhost_task_fn+0x57/0xd0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.743447] ? raw_spin_rq_unlock+0x10/0x40 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.747652] ? finish_task_switch.isra.0+0x85/0x2c0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.752557] ? pfx_vhost_task_fn+0x10/0x10 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.756845] ret_from_fork+0x44/0x70 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.760450] ? pfx_vhost_task_fn+0x10/0x10 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.764745] ret_from_fork_asm+0x1b/0x30 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.768691] RIP: 0033:0x0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.771356] Code: Unable to access opcode bytes at 0xffffffffffffffd6. Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.777904] RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.785494] RAX: 0000000000000000 RBX: 00005634a56ad210 RCX: 00007fef9af16c5b Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.792647] RDX: 0000000000000000 RSI: 000000000000af01 RDI: 0000000000000015 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.799807] RBP: 00007ffd385f9de0 R08: 00007ffd385f9d80 R09: 0000000000000073 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.806961] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.814115] R13: 0000000000000015 R14: 00005634a56ad210 R15: 0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.821270] Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.823479] Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables 8021q garp mrp softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof intel_rapl_msr snd_sof_utils snd_soc_hdac_hda intel_rapl_common snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core x86_pkg_temp_thermal intel_powerclamp coretemp snd_compress ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm snd_hda_codec irqbypass crct10dif_pclmul i915 polyval_clmulni polyval_generic snd_hda_core ghash_clmulni_intel snd_hwdep mei_pxp mei_hdcp aesni_intel btusb snd_pcm crypto_simd btrtl cryptd btbcm drm_buddy btintel btmtk snd_timer cmdlinepart spi_nor ttm snd rapl intel_cstate Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.823532] wmi_bmof pcspkr mtd soundcore bluetooth drm_display_helper mei_me ecdh_generic ecc mei cec rc_core drm_kms_helper i2c_algo_bit acpi_tad acpi_pad joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor uas usb_storage raid6_pq libcrc32c simplefb hid_generic usbkbd usbhid hid nvme spi_intel_pci nvme_core nvme_common crc32_pclmul spi_intel xhci_pci xhci_pci_renesas i2c_i801 i2c_smbus igc xhci_hcd ahci libahci video wmi Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.959159] CR2: ffffffffbff8a2a0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3613.962494] ---[ end trace 0000000000000000 ]--- Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.050388] RIP: 0010:0xffffffffbff8a2a0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.054377] Code: Unable to access opcode bytes at 0xffffffffbff8a276. Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.060918] RSP: 0018:ffffab5b47257cf0 EFLAGS: 00010282 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.066161] RAX: ffff972e946394b0 RBX: ffff972e946300c0 RCX: 0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.073314] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff972e946394b0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.080469] RBP: ffffab5b47257e68 R08: 0000000000000000 R09: 0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.087616] R10: 0000000000000000 R11: 0000000000000000 R12: ffff972e946300d0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.094769] R13: 0000000000000000 R14: ffff972e94630000 R15: 0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.101923] FS: 00007fef9815c4c0(0000) GS:ffff97359fb00000(0000) knlGS:0000000000000000 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.110033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.115799] CR2: ffffffffbff8a276 CR3: 000000011fa98000 CR4: 0000000000752ee0 Feb 11 14:19:09 10.88.89.252 kernel: [ 3614.122948] PKRU: 55555554

Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.462323] general protection fault, probably for non-canonical address 0xffff3993e44e4ab8: 0000 [#1] PREEMPT SMP NOPTI Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.473226] CPU: 2 PID: 1402 Comm: vhost-1377 Tainted: P U O 6.5.11-8-pve #1 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.481508] Hardware name: Default string Default string/Default string, BIOS 5.27 09/28/2023 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.490051] RIP: 0010:vhost_tx_batch.constprop.0+0x93/0x260 [vhost_net] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.496699] Code: 47 20 48 8b 80 88 00 00 00 ff d0 0f 1f 00 85 c0 78 61 8b 8b bc 49 00 00 85 c9 75 39 c7 83 c0 49 00 00 00 00 00 00 48 8b 45 e0 <65> 48 2b 04 25 28 00 00 00 0f 85 ab 01 00 00 48 83 c4 20 5b 41 5c Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.515470] RSP: 0018:ffffb9b887fdbcf0 EFLAGS: 00010246 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.520716] RAX: b3cc1c041d2aa200 RBX: ffff9cc6449e4ab8 RCX: 0000000000000000 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.527866] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.535019] RBP: ffffb9b887fdbd28 R08: 0000000000000000 R09: 0000000000000000 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.542175] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9cc6449e0000 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.549330] R13: 0000000000000280 R14: ffff9cc64c369b40 R15: ffff9cc6449e0000 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.556489] FS: 00007ff3a33274c0(0000) GS:ffff9ccd9fb00000(0000) knlGS:0000000000000000 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.564598] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.570364] CR2: 00007fa8309ba2c8 CR3: 000000010b620000 CR4: 0000000000752ee0 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.577516] PKRU: 55555554 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.580253] Call Trace: Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.582724] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.584857] ? show_regs+0x6d/0x80 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.588287] ? die_addr+0x37/0xa0 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.591624] ? exc_general_protection+0x1c3/0x460 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.596348] ? asm_exc_general_protection+0x27/0x30 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.601247] ? vhost_tx_batch.constprop.0+0x93/0x260 [vhost_net] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.607276] handle_tx_copy+0x1cd/0x6f0 [vhost_net] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.612180] handle_tx+0xbc/0xc0 [vhost_net] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.616483] handle_tx_kick+0x15/0x20 [vhost_net] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.621218] vhost_worker+0x46/0x80 [vhost] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.625438] vhost_task_fn+0x57/0xd0 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.629045] ? raw_spin_rq_unlock+0x10/0x40 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.633257] ? finish_task_switch.isra.0+0x85/0x2c0 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.638165] ? __pfx_vhost_task_fn+0x10/0x10 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.642465] ret_from_fork+0x44/0x70 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.646072] ? __pfx_vhost_task_fn+0x10/0x10 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.650367] ret_from_fork_asm+0x1b/0x30 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.654315] RIP: 0033:0x0 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.656988] Code: Unable to access opcode bytes at 0xffffffffffffffd6. Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.663532] RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.671118] RAX: 0000000000000000 RBX: 000055e66a0bc320 RCX: 00007ff3a60e1c5b Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.678267] RDX: 0000000000000000 RSI: 000000000000af01 RDI: 0000000000000015 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.685423] RBP: 00007ffe43b894b0 R08: 00007ffe43b89450 R09: 0000000000000073 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.692573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.699726] R13: 0000000000000015 R14: 000055e66a0bc320 R15: 0000000000000000 Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.706879] Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.709093] Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables 8021q garp mrp softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof intel_rapl_msr intel_rapl_common snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match x86_pkg_temp_thermal snd_soc_acpi intel_powerclamp soundwire_generic_allocation soundwire_bus coretemp snd_soc_core kvm_intel snd_compress ac97_bus snd_pcm_dmaengine i915 kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec irqbypass crct10dif_pclmul polyval_clmulni snd_hda_core polyval_generic snd_hwdep ghash_clmulni_intel drm_buddy snd_pcm aesni_intel ttm btusb mei_pxp mei_hdcp crypto_simd btrtl btbcm btintel btmtk drm_display_helper snd_timer cryptd cec cmdlinepart bluetooth rapl spi_nor rc_core Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.709138] snd mei_me drm_kms_helper wmi_bmof intel_cstate pcspkr ecdh_generic mtd soundcore mei i2c_algo_bit ecc acpi_tad acpi_pad joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb hid_generic usbkbd uas usbhid hid usb_storage nvme nvme_core xhci_pci i2c_i801 xhci_pci_renesas spi_intel_pci crc32_pclmul spi_intel i2c_smbus xhci_hcd nvme_common ahci igc libahci video wmi Feb 10 10:57:30 10.88.89.252 kernel: [ 3139.843198] ---[ end trace 0000000000000000 ]--- Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.933881] RIP: 0010:vhost_tx_batch.constprop.0+0x93/0x260 [vhost_net] Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.940548] Code: 47 20 48 8b 80 88 00 00 00 ff d0 0f 1f 00 85 c0 78 61 8b 8b bc 49 00 00 85 c9 75 39 c7 83 c0 49 00 00 00 00 00 00 48 8b 45 e0 <65> 48 2b 04 25 28 00 00 00 0f 85 ab 01 00 00 48 83 c4 20 5b 41 5c Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.959344] RSP: 0018:ffffb9b887fdbcf0 EFLAGS: 00010246 Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.964597] RAX: b3cc1c041d2aa200 RBX: ffff9cc6449e4ab8 RCX: 0000000000000000 Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.971760] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.978914] RBP: ffffb9b887fdbd28 R08: 0000000000000000 R09: 0000000000000000 Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.986069] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9cc6449e0000 Feb 10 10:57:31 10.88.89.252 kernel: [ 3139.993220] R13: 0000000000000280 R14: ffff9cc64c369b40 R15: ffff9cc6449e0000 Feb 10 10:57:31 10.88.89.252 kernel: [ 3140.000374] FS: 00007ff3a33274c0(0000) GS:ffff9ccd9fb00000(0000) knlGS:0000000000000000 Feb 10 10:57:31 10.88.89.252 kernel: [ 3140.008479] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 10 10:57:31 10.88.89.252 kernel: [ 3140.014244] CR2: 00007fa8309ba2c8 CR3: 000000010b620000 CR4: 0000000000752ee0 Feb 10 10:57:31 10.88.89.252 kernel: [ 3140.021396] PKRU: 55555554

Feb 9 17:37:31 10.88.89.252 kernel: [77589.722381] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: Feb 9 17:37:31 10.88.89.252 kernel: [77589.728523] rcu: 0-...0: (18 ticks this GP) idle=04f4/1/0x4000000000000000 softirq=1170560/1170560 fqs=3021 Feb 9 17:37:31 10.88.89.252 kernel: [77589.738370] rcu: hardirqs softirqs csw/system Feb 9 17:37:31 10.88.89.252 kernel: [77589.743964] rcu: number: 0 0 0 Feb 9 17:37:31 10.88.89.252 kernel: [77589.749562] rcu: cputime: 0 0 0 ==> 30020(ms) Feb 9 17:37:31 10.88.89.252 kernel: [77589.756544] rcu: 2-...0: (25 ticks this GP) idle=9984/1/0x4000000000000000 softirq=1163170/1163171 fqs=3022 Feb 9 17:37:31 10.88.89.252 kernel: [77589.766385] rcu: hardirqs softirqs csw/system Feb 9 17:37:31 10.88.89.252 kernel: [77589.771982] rcu: number: 0 0 0 Feb 9 17:37:31 10.88.89.252 kernel: [77589.777575] rcu: cputime: 0 0 0 ==> 30020(ms) Feb 9 17:37:31 10.88.89.252 kernel: [77589.784557] rcu: (detected by 3, t=15009 jiffies, g=1935437, q=12163 ncpus=4) Feb 9 17:37:31 10.88.89.252 kernel: [77589.791801] Sending NMI from CPU 3 to CPUs 0: Feb 9 17:37:31 10.88.89.252 kernel: [77598.441949] watchdog: Watchdog detected hard LOCKUP on cpu 1 Feb 9 17:37:31 10.88.89.252 kernel: [77598.441951] Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables 8021q garp mrp softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink intel_rapl_msr soundwire_cadence intel_rapl_common snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_hda_codec_hdmi snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core x86_pkg_temp_thermal intel_powerclamp snd_soc_acpi_intel_match coretemp snd_soc_acpi soundwire_generic_allocation kvm_intel soundwire_bus snd_soc_core kvm snd_compress ac97_bus snd_pcm_dmaengine irqbypass crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg snd_intel_sdw_acpi aesni_intel snd_hda_codec i915 crypto_simd snd_hda_core cryptd snd_hwdep cmdlinepart snd_pcm drm_buddy ttm mei_pxp mei_hdcp spi_nor drm_display_helper snd_timer cp210x ch341 cec rapl rc_core pcspkr Feb 9 17:37:31 10.88.89.252 kernel: [77598.441991] intel_cstate snd mtd wmi_bmof mei_me drm_kms_helper soundcore usbserial i2c_algo_bit mei acpi_tad acpi_pad mac_hid vhost_net vhost vhost_iotlb tap efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb nvme i2c_i801 crc32_pclmul spi_intel_pci nvme_core spi_intel igc nvme_common i2c_smbus xhci_pci ahci xhci_pci_renesas libahci xhci_hcd video wmi Feb 9 17:37:31 10.88.89.252 kernel: [77598.442014] CPU: 1 PID: 16 Comm: rcu_preempt Tainted: P U O 6.5.11-8-pve #1 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442016] Hardware name: Default string Default string/Default string, BIOS 5.27 09/28/2023 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442017] RIP: 0010:native_queued_spin_lock_slowpath+0x7f/0x2d0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442025] Code: 00 00 f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 5f 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e Feb 9 17:37:31 10.88.89.252 kernel: [77598.442026] RSP: 0018:ffffb910c0163da8 EFLAGS: 00000002 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442027] RAX: 0000000000000001 RBX: ffffffff84d647c0 RCX: 0000000000000000 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442028] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff84d647c0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442029] RBP: ffffb910c0163dc8 R08: 0000000000000000 R09: 0000000000000000 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442030] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442031] R13: ffff99f5c0c499c0 R14: 0000000000000000 R15: 0000000000000000 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442032] FS: 0000000000000000(0000) GS:ffff99fd1fa80000(0000) knlGS:0000000000000000 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442034] CR2: 000055b4c8ec4d9c CR3: 00000004f9c34000 CR4: 0000000000752ee0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442035] PKRU: 55555554 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442036] Call Trace: Feb 9 17:37:31 10.88.89.252 kernel: [77598.442037] Feb 9 17:37:31 10.88.89.252 kernel: [77598.442040] ? show_regs+0x6d/0x80 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442044] ? watchdog_hardlockup_check+0x10c/0x1e0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442048] ? watchdog_overflow_callback+0x6b/0x80 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442050] ? perf_event_overflow+0x119/0x380 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442053] ? perf_event_overflow+0x19/0x30 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442054] ? handle_pmi_common+0x175/0x3f0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442059] ? intel_pmu_handle_irq+0x11f/0x480 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442061] ? perf_event_nmi_handler+0x2b/0x50 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442063] ? nmi_handle+0x5d/0x160 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442066] ? default_do_nmi+0x47/0x130 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442068] ? exc_nmi+0x1d8/0x2c0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442069] ? end_repeat_nmi+0x16/0x67 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442074] ? native_queued_spin_lock_slowpath+0x7f/0x2d0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442076] ? native_queued_spin_lock_slowpath+0x7f/0x2d0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442078] ? native_queued_spin_lock_slowpath+0x7f/0x2d0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442080] Feb 9 17:37:31 10.88.89.252 kernel: [77598.442081] Feb 9 17:37:31 10.88.89.252 kernel: [77598.442081] _raw_spin_lock_irqsave+0x5c/0x80 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442083] force_qs_rnp+0xfe/0x250 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442086] ? pfx_rcu_implicit_dynticks_qs+0x10/0x10 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442087] rcu_gp_fqs_loop+0x38f/0x4c0 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442089] ? pfx_rcu_gp_kthread+0x10/0x10 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442090] rcu_gp_kthread+0xce/0x170 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442092] kthread+0xef/0x120 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442094] ? __pfx_kthread+0x10/0x10 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442097] ret_from_fork+0x44/0x70 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442099] ? pfx_kthread+0x10/0x10 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442102] ret_from_fork_asm+0x1b/0x30 Feb 9 17:37:31 10.88.89.252 kernel: [77598.442105] Feb 9 17:37:31 10.88.89.252 kernel: [77599.716806] nmi_backtrace_stall_check: CPU 0: NMIs are not reaching exc_nmi() handler (CPU currently in NMI handler function), last activity: 22350 jiffies ago. Feb 9 17:37:31 10.88.89.252 kernel: [77600.091817] Sending NMI from CPU 3 to CPUs 2: Feb 9 17:37:31 10.88.89.252 kernel: [77610.016824] nmi_backtrace_stall_check: CPU 2: NMIs are not reaching exc_nmi() handler (CPU currently in NMI handler function), last activity: 24014 jiffies ago.

`

habitats-tech commented 4 months ago

Seems like an N100 issue or some king of RAM issue. Try to revert to kernel 5.x, or install Proxmox 7.x

danhusan commented 4 months ago

Thanks for the response but as memtest ran successfully and the system is 100% stable without Frigate I will just migrate to something else. Just leaving my experiences for future reference.

Onyx1640 commented 4 months ago

I may be having an issue similar to the ones reported in this thread.

I had been running Frigate on Proxmox on an older Dell XPS Desktop with an Intel i7-7700 with a Nvidia P2000 GPU passed through. This setup had been running for months just fine.

I just built a new Proxmox host with a Ryzen 3900x(leftover from upgrading my main Desktop) on an Asrock B550M Pro4 board with the same P2000 GPU.

Ever since putting the new system together and setting up my Frigate VM it's been crashing/freezing at random intervals with nothing incriminating in the Proxmox log. Sometimes it runs for days and other times just hours. The VM running Frigate is Rocky Linux 9 with the Docker CE repo added and the Nvidia drivers installed. I brought the same VM over from the old setup.

Anyway I'm going to do some additional troubleshooting and see if I can get a stable setup. Unfortunately I can't run the old system in parallel as I had to gut most of the good parts from it to make the new one.

mochoandre commented 3 months ago

Anybody could find a solution? I think it's some sort of bug with video streaming in proxmox with AMD processors. I am not using Frigate but I have a Ubuntu VM TVendheand server with proxmox 8.1 and AMD FX-8320E . Whenever I'm streaming from that with VM's after 10/15/20 minutes system freezes and no acess to VM or Proxmox