cloudfoundry / diego-release

BOSH Release for Diego
Apache License 2.0
201 stars 212 forks source link

cell VM keep running out of resource #129

Closed StanleyShen closed 8 years ago

StanleyShen commented 8 years ago

Hello,

I am having a cf/diego environment deployed on AWS. I have several java application want to be pushed to it, each one asks for 2-3G memory.

app1: 512M app2: 2g app3: 3g app4: 3g app5: 2g

When the cell VM is 7.5G (c3.xlarge), it failed at pushing the 4th application, the error message is "Insufficient Resource". After I changed to 15G(m3.xlarge), the 4th application can be pushed, but after the application is pushed, the memory is almost allocated again.

Is it normally correct? The four apps ask for 8.5G in total, but 15G memory is used.

Could someone help on it? How do we know exactly the memory used for each application, and why 15G memory is used? Where should I start to investigate on this issue?

cf-gitbot commented 8 years ago

We have created an issue in Pivotal Tracker to manage this. You can view the current status of your issue at: https://www.pivotaltracker.com/story/show/113216327.

StanleyShen commented 8 years ago
top

Attached is one snapshot of top command output.

emalm commented 8 years ago

Hi, @StanleyShen,

To determine the total and remaining resources that a Diego cell advertises to the auction, bosh ssh to its VM and run curl -s http://localhost:1800/state. The response will be a JSON-encoded payload. Here's one example from a cell in one of the Diego team's environments, after re-formatting via jq .:

{
  "Evacuating": false,
  "Zone": "z1",
  "StartingContainerCount": 0,
  "Tasks": [],
  "LRPs": [
    {
      "RootFs": "",
      "DiskMB": 1024,
      "MemoryMB": 1024,
      "domain": "cf-apps",
      "index": 0,
      "process_guid": "157c18f6-e266-404e-b175-537b44a082c0-8b3c1f02-8ffa-44c5-87c7-5cfd679083b6"
    }
  ],
  "TotalResources": {
    "Containers": 250,
    "DiskMB": 41889,
    "MemoryMB": 7479
  },
  "AvailableResources": {
    "Containers": 249,
    "DiskMB": 40865,
    "MemoryMB": 6455
  },
  "RootFSProviders": {
    "preloaded": {
      "type": "fixed_set",
      "set": {
        "cflinuxfs2": {}
      }
    },
    "docker": {
      "type": "arbitrary"
    }
  }
}

In this case, the cell has one LRP instance using 1024 MB of the 7479 MB of memory it has available to allocate. In this case, the cell is an m3.large instance on AWS, which also has 7.5 GB of memory available. The 7479 MB number comes from garden-linux automatically detecting how much memory is available from the system configuration.

Thanks, Eric, CF Runtime Diego PM

StanleyShen commented 8 years ago

Thanks Eric for information.

StanleyShen commented 8 years ago

Hello Eric, after deployed several APPs, here is the resource usage. And I am trying to deploy another app APPx, which asks for disk_quota: 1024M and memory: 2048M

It looks like the remain resource is enough for APPx, but it failed to do that. It complains about "insufficient resources" 2016-02-19T13:38:04.93+0800 [API/0] ERR Failed to stage application: insufficient resources

What could be the reason?

{ "RootFSProviders": { "docker": { "type": "arbitrary" }, "preloaded": { "set": { "cflinuxfs2": {} }, "type": "fixed_set" } }, "AvailableResources": { "MemoryMB": 3775, "DiskMB": 1345, "Containers": 245 }, "TotalResources": { "MemoryMB": 15039, "DiskMB": 22849, "Containers": 250 }, "LRPs": [ { "process_guid": "2d75a62a-4a25-4bb7-896c-5441353ac803-604e5b96-8a76-4797-af2e-55659822e44f", "index": 0, "domain": "cf-apps", "MemoryMB": 4096, "DiskMB": 5120, "RootFs": "" }, { "process_guid": "1b35affe-f4e8-40e1-a1c9-0c71cabe4812-54cc77a9-94ce-41a2-a8e9-06393ffbf7b7", "index": 0, "domain": "cf-apps", "MemoryMB": 512, "DiskMB": 5120, "RootFs": "" }, { "process_guid": "26f38609-6a10-45ec-ae59-902f0225499c-8c5532f5-335d-4d99-929b-4b1d63b6cf92", "index": 0, "domain": "cf-apps", "MemoryMB": 4096, "DiskMB": 5120, "RootFs": "" }, { "process_guid": "f0a2b3ad-14e9-47b9-af18-cd24a14b41f7-0dc8031a-7f58-46b7-95b0-d6aa3f754a77", "index": 0, "domain": "cf-apps", "MemoryMB": 2048, "DiskMB": 5120, "RootFs": "" }, { "process_guid": "08a5defb-0ef5-41c2-b645-ed819627fd20-14a6e946-9cd9-461b-b8fb-b947ced0bd8e", "index": 0, "domain": "cf-apps", "MemoryMB": 512, "DiskMB": 1024, "RootFs": "" } ], "Tasks": [], "Zone": "z1", "Evacuating": false }

cf-gitbot commented 8 years ago

We have created an issue in Pivotal Tracker to manage this. You can view the current status of your issue at: https://www.pivotaltracker.com/story/show/114067663.

emalm commented 8 years ago

Hi, @StanleyShen,

The failure is in scheduling the staging task, which by default allocates 6 GB = 6144 MB of disk space, as specified in https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L486-L488. The Diego cell in your deployment is reporting only 1345 MB of disk free. You can change the amount of disk that staging tasks allocate by adjusting the dea_next.staging_disk_limit_mb BOSH property in the CF deployment manifest.

Thanks, Eric

StanleyShen commented 8 years ago

Thanks for information.

For staging_disk_limit_mb, is there any limitation on how much disk need to be allocated?

For example I have 3 APPs 1st asks for 3G disk quote 2nd asks for 5G disk quote 3rd asks for 1G disk quote

In this case, I must set staging_disk_limit_mb to 5G, right?

emalm commented 8 years ago

It's my understanding that the staging_disk_limit_mb value is a minimum guaranteed value for the staging disk allocation, and that if your app requests more disk that that value, its staging task also gets that amount of disk. So you could safely set it to a lower value and still be able to stage your other apps.

Thanks, Eric

StanleyShen commented 8 years ago

Thanks Eric, I changed it to 1024 and it works for me for now.