Corsinvest / cv4pve-autosnap

Automatic snapshot tool for Proxmox VE
https://www.corsinvest.it/cv4pve
GNU General Public License v3.0
413 stars 51 forks source link

VM sometimes remain locked "snapshot-delete" status #46

Closed paolo-pf closed 3 years ago

paolo-pf commented 3 years ago

Hi there, I'm using your utility, running in crontab this command everynight: cv4pve-autosnap --host=localhost --username=xxxxxxx --password=yyyyyyyy --timeout=8600 --vmid="all" snap --label='daily' --keep=7

Works everything since a lot of months except that, sometimes, on large disks VMs (for example VM with 2 Tb hdu), when it should purge old snapshots, it goes in error and vm status remain locked with "snapshot-delete" status. In also, old snapshot remains in "delete" status, it doesn't get deleted e I have to remove it manually with force switch.

In this state, VM is still running, but we cannot obviously do next snapshots in the next night, or even a vm backup. So we must monitor every morning VM status and correct manually when this happens.

To fix this problem I use these commands: qm unlock qm delsnapshots "snapshots-name" -f

Our Proxmox filesystem is the old, but still reliable, directory filesystem over phisical hard disks, using qcow2 as vm file format.

franklupo commented 3 years ago

Hi, Run with --debug and send me the logs. Surely the problem will be the timeout. Try using --timeout and set a long time.

Best regards

paolo-pf commented 3 years ago

Hi there, I'm already using --timeout=8600 that should be enough (more than 2 hours!), I've tried even --timeout=86400 but nothing changed.

In attach detailed debug log trying a snapshot of a large vm that gives error very often! (I've masked personal informations!)

When I get that error, obviously cv4pve-autosnap terminate suddenly with that error, but snapshot procedure in Proxmox console keeps going to the end. Then VM remains locked in "pending-delete" status and I should do qm unlock

Then if I try to do a clean command to purge old snapshots I get this error because snapshot is in "delete" status: VM 300 qmp command 'blockdev-snapshot-delete-internal-sync' failed - Snapshot with id 'null' and name 'autodaily201225031308' does not exist on device 'drive-scsi0'

log-snap.txt

Thanks for helping... Best regards

franklupo commented 3 years ago

HI, How long does the operation take before it stops? I looked at error but it talks about configuration cluster/host error. See "proxmox too many redirects 599"

Best regards

paolo-pf commented 3 years ago

Hi there, When it does the snapshots of some heavy VM, it stops after 20-30 secs... but snapshot creation still goes ahead and works. I've see about those errors, I'm trying to get them fixed, but we've never had cluster errors and cluster health is fine.

Anyway I've tried to insert all the host names in the /etc/hosts file of each node. We'll see next night how it works.

Regards

franklupo commented 3 years ago

Hi, if use Proxmox VE version 6.2 or higher consider use the --api-token instead username/password. The session not expire if process is long.

Best regards

franklupo commented 3 years ago

News?

rootbdfy commented 2 years ago

Hello, I have same issue with proxmox 7.1-6 (cv4pve-autosnap 1.12.0). Using api token instead username not helped me. Reproduced on VMs with many snapshots.

изображение

franklupo commented 2 years ago

Hi, manually from web GUI Proxmox VE does snapshot creation and deletion work?

best regards

rootbdfy commented 2 years ago

Hi, manually from web GUI Proxmox VE does snapshot creation and deletion work?

best regards

No, it stuck on delete state. изображение

franklupo commented 2 years ago

Ok, if it doesn't work from GUI it's a zfs problem

rootbdfy commented 2 years ago

This snapshot is absent on zfs. Manually deleted this snap section from configs. Will see what happens next. Thx !

wstraszak commented 2 years ago

@rootbdfy how it ended up?

rootbdfy commented 2 years ago

@rootbdfy how it ended up?

Hi! All fine.

duven87 commented 1 month ago

I have still oft this problem mit pve7 , hw raid ext4.. any solution?

franklupo commented 1 month ago

I have still oft this problem mit pve7 , hw raid ext4.. any solution?

ext4 is not a good solution for snapshot.