Closed GoogleCodeExporter closed 8 years ago
Thanks for reporting this, and good catch. Sorry about the hassle.
I'll route this to the AppEngine team.
Original comment by z...@google.com
on 5 Oct 2015 at 1:08
Hey,
We delete VMs asynchronously, so it's possible they're still running when the
initial deletion returns a success. In the case of failures, we have a periodic
maintenance job that runs frequently looking for leaked VMs like this, and
deletes them.
How long did you wait before deploying a new version and deleting it?
Original comment by dlor...@google.com
on 5 Oct 2015 at 4:50
Original comment by dlor...@google.com
on 5 Oct 2015 at 4:50
The interval between my failed attempt to delete v702 and the redeployment and
successful delete was a bit over 1 hour.
Original comment by pi...@ideanest.com
on 5 Oct 2015 at 5:03
Thanks for the report. I think this is basically the expected behavior of our
system. I see some errors deleting the VMs in our logs, and then you manually
fixed this before our next cleanup run could come along. If you see this again,
please let me know.
Original comment by dlor...@google.com
on 6 Oct 2015 at 5:02
For reference, what's the interval between cleanup attempts in your system? I
assume I continue to be billed for the phantom VMs, so if I happened to be
doing a bunch of deploys in a row (as sometimes happens when figuring out a
build breakage) I could end up on the hook for a not-entirely-trivial amount of
money.
Original comment by pi...@ideanest.com
on 6 Oct 2015 at 7:11
We currently run this every 2 hours. In most cases this job should do nothing
though, since deleting the version should delete the VMs. That job is only
relied on if deleting the VMs failed for some reason. You're correct that you
do continue to get billed here, but this should be a very rare case.
Original comment by dlor...@google.com
on 9 Oct 2015 at 5:46
Looks like this happened to me again on a deploy ~10 minutes ago. I deployed
version 712 and deleted version 706, but it's still running. Either I'm very
unlucky or there's a reproducible issue with the initial VM deletion... I'm
going to let the v706 VMs keep running this time to see if the backup process
cleans them up.
Original comment by pi...@ideanest.com
on 11 Oct 2015 at 7:27
Looks like the v706 instances got cleaned up ~30 minutes after the deploy.
I'll keep an eye out on future deploys to see if the initial deletion failure
is common for me.
Original comment by pi...@ideanest.com
on 11 Oct 2015 at 7:59
Happened again: deployed v714 at 10:29pm PDT, and the v712 instances didn't
get shut down until 10:49pm. A 20 minute delay isn't terrible, I guess, but it
still looks like the initial deletion is failing consistently. And also means
I have to run that much longer with version skew...
Original comment by pi...@ideanest.com
on 12 Oct 2015 at 5:54
Original issue reported on code.google.com by
pi...@ideanest.com
on 5 Oct 2015 at 10:06