lxc-top for one or more lxd hosts

jochumdev commented 9 years ago

I would love to have the API and the tool lxc top for a single lxd instance or multiple ones.

Need all "lxc-top" gave me:

Container               CPU      CPU      CPU      BlkIO        Mem
Name                   Used      Sys     User      Total       Used

stgraber commented 9 years ago

Shouldn't be too difficult to do once we have the status API, though it'll be quite taxing on the remote server when pulling those stats for all containers every few seconds.

myu1d157h0u54nd commented 8 years ago

Hello what is the status of this feature? Is there any other way to do an analysis of the resources of the host? to evaluate the consumption of resources (RAM, CPU) of each container

davidfavor commented 7 years ago

Let me know best way to track + potentially contribute to Status API project.

Thanks.

stgraber commented 7 years ago

So we do have what was referred to as the status API, in that we do return a bunch of stats in /1.0/containers/NAME/state but as originally described, this is very taxing so I'd expect a "top" command based on it to take well over a second to retrieve data if you have more than 20 containers or so and obviously would eat a whole lot of CPU/kernel time on the target host while doing so.

davidfavor commented 7 years ago

Yep. Sounds resource intensive.

I'll take a look at /proc + see if I can come up with a lightweight lxd-top.

Thanks for the info.

tchwpkgorg commented 5 years ago

You can check systemd-cgtop for something somehow familiar.

stgraber commented 5 years ago

The problem isn't with making the UI/UX for it, the problem is making it not slow down your system to a crawl while retrieving the data. cgtop runs locally and only looks at cgroups, a LXD top would need to work remotely over a REST API and also look at things like network counters, interface details, ... which are very costly to retrieve.

Right now we haven't found a good way to make a tool which refreshes more than every couple of minutes or so and which wouldn't come with a 10-20% performance hit when running it.

ghost commented 5 years ago

would using websockets for this be more efficient?

stgraber commented 5 years ago

Yeah, if we can formulate a clear query as to the set of containers and resources we want to check, then updates could indeed be sent over websocket to avoid the client having to get the new values.

Some of those values will be very costly to fetch, so at a high frequency you may still find this end up taking more CPU than the container would on its own, which would be a bit of an issue.

There is also added complexity added by clustering. One solution would be to only target a single cluster member, but I suspect we'd still get quite a bit of demand for having this work cluster wide, adding the need for LXD to internally do the same request on all members and then aggregate the result before sending it to the client.

With the addition of projects since this was first requested, we also would likely have to add that as a filter, allowing you to monitor:

whole cluster
specific project in cluster
single member
specific project in member

That's assuming we don't also want to do name based filtering to allow the user to only request the specific containers they care about.

zrav commented 5 years ago

If certain counters are too costly I'd consider excluding them from the default view.

The mentioned cgtop is interesting and also htop in tree display mode can already show a lot of the stats (including block IO) grouped by container, it is just missing the function of summing all the process stats to provide totals for one container. htop manages to do this with pretty light load, so a /proc based approach looks workable.

stgraber commented 3 years ago

Our current approach to tackle much of this (and quite a bit more because of scale) is to introduce a new metrics endpoint which returns data in a prometheus compatible format. The specification can be found here: https://discuss.linuxcontainers.org/t/lxd-metric-exporter-for-instances/11735

This isn't exactly what was first suggested in this issue as it doesn't really provide you with a lxc top command. Though that endpoint should make the implementation of such a command quite a bit easier. It's still going to be an expensive endpoint to hit. Initially we don't expect to do any caching, though if we see the load becoming problematic, we're likely to do basic caching for a period of 10s or so. The fastest polling rate we're likely to recommend is 15s.

canonical / lxd

lxc-top for one or more lxd hosts #822