cloudfoundry-incubator / pat

16 stars 19 forks source link

Leaking droplets #110

Open drnic opened 10 years ago

drnic commented 10 years ago

I'm going to raise this issue here as I can't reproduce it deploying apps normally.

I am running the cf dea-ads command from https://github.com/cloudfoundry/tools-cf-plugin/ so I can see the number of droplets that the runners think they are hosting.

With a fresh CF it starts at 0 (obviously).

I run pat for 4 iterations /pat -workload=gcf:push -iterations=4 -concurrency=2 - there were 7 droplets. I delete the 4 apps and it returns down to 3 droplets.

It should be 0; not 3.

I ran pat again for 2 iterations /pat -workload=gcf:push -iterations=2 -concurrency=2, and after deleting the created apps I am now at 5 droplets.

image

Can anyone think of why pat might be causing this? Or how CF could be allowing droplets to be created in excess of the apps being pushed?

To be clear, I have 5 droplets and 0 apps:

image

/cc @jbayer

drnic commented 10 years ago

A few hours later and the rogue droplets disappeared from the count. Not sure how long between this ticket creation and now.

jbayer commented 10 years ago

@drnic it's likely the droplet deletion job only kicking in asynchronously via clock every so often [1]. i don't know where the config is that says how often the jobs are run, but i assume it's only every once in awhile. /cc @ematpl @dieucao @MarkKropf

[1] https://github.com/cloudfoundry/cloud_controller_ng/blob/master/app/jobs/runtime/droplet_deletion.rb

emalm commented 10 years ago

Hi, @drnic,

As @jbayer pointed out, the app droplet blobs are deleted asynchronously via Delayed::Job, but they're not triggered by the CC's clock mode. When an app is deleted, the deletion cascades to its droplets, each of which then enqueues a DropletDeletion job on the 'cc-generic' queue. The generic-queue workers then work those off, but they can handle only 1 at a time per worker, so it could take some time for all the droplet blobs to be deleted.

On the other hand, that cf plugin does seem to be analyzing the state of the DEAs via their advertisements, not the CC and its blobstore. Can you get any more information about which instances the DEA thinks it has? Its varz endpoint exposes more detailed per-app data about instances in the instance_registry value, so that might be the easiest thing to query first.

Thanks, Eric

drnic commented 10 years ago

Thanks for the info. I'll try to learn more about the wayward droplets.

On Sun, Aug 3, 2014 at 10:53 PM, Eric Malm notifications@github.com wrote:

Hi, @drnic, As @jbayer pointed out, the app droplet blobs are deleted asynchronously via Delayed::Job, but they're not triggered by the CC's clock mode. When an app is deleted, the deletion cascades to its droplets, each of which then enqueues a DropletDeletion job on the 'cc-generic' queue. The generic-queue workers then work those off, but they can handle only 1 at a time per worker, so it could take some time for all the droplet blobs to be deleted. On the other hand, that cf plugin does seem to be analyzing the state of the DEAs via their advertisements, not the CC and its blobstore. Can you get any more information about which instances the DEA thinks it has? Its varz endpoint exposes more detailed per-app data about instances in the instance_registry value, so that might be the easiest thing to query first. Thanks,

Eric

Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-incubator/pat/issues/110#issuecomment-51019885