Open jochumdev opened 9 years ago
Shouldn't be too difficult to do once we have the status API, though it'll be quite taxing on the remote server when pulling those stats for all containers every few seconds.
Hello what is the status of this feature? Is there any other way to do an analysis of the resources of the host? to evaluate the consumption of resources (RAM, CPU) of each container
Let me know best way to track + potentially contribute to Status API project.
Thanks.
So we do have what was referred to as the status API, in that we do return a bunch of stats in /1.0/containers/NAME/state but as originally described, this is very taxing so I'd expect a "top" command based on it to take well over a second to retrieve data if you have more than 20 containers or so and obviously would eat a whole lot of CPU/kernel time on the target host while doing so.
Yep. Sounds resource intensive.
I'll take a look at /proc + see if I can come up with a lightweight lxd-top.
Thanks for the info.
You can check systemd-cgtop for something somehow familiar.
The problem isn't with making the UI/UX for it, the problem is making it not slow down your system to a crawl while retrieving the data. cgtop runs locally and only looks at cgroups, a LXD top would need to work remotely over a REST API and also look at things like network counters, interface details, ... which are very costly to retrieve.
Right now we haven't found a good way to make a tool which refreshes more than every couple of minutes or so and which wouldn't come with a 10-20% performance hit when running it.
would using websockets for this be more efficient?
Yeah, if we can formulate a clear query as to the set of containers and resources we want to check, then updates could indeed be sent over websocket to avoid the client having to get the new values.
Some of those values will be very costly to fetch, so at a high frequency you may still find this end up taking more CPU than the container would on its own, which would be a bit of an issue.
There is also added complexity added by clustering. One solution would be to only target a single cluster member, but I suspect we'd still get quite a bit of demand for having this work cluster wide, adding the need for LXD to internally do the same request on all members and then aggregate the result before sending it to the client.
With the addition of projects since this was first requested, we also would likely have to add that as a filter, allowing you to monitor:
That's assuming we don't also want to do name based filtering to allow the user to only request the specific containers they care about.
If certain counters are too costly I'd consider excluding them from the default view.
The mentioned cgtop is interesting and also htop in tree display mode can already show a lot of the stats (including block IO) grouped by container, it is just missing the function of summing all the process stats to provide totals for one container. htop manages to do this with pretty light load, so a /proc based approach looks workable.
Our current approach to tackle much of this (and quite a bit more because of scale) is to introduce a new metrics endpoint which returns data in a prometheus compatible format. The specification can be found here: https://discuss.linuxcontainers.org/t/lxd-metric-exporter-for-instances/11735
This isn't exactly what was first suggested in this issue as it doesn't really provide you with a lxc top
command.
Though that endpoint should make the implementation of such a command quite a bit easier.
It's still going to be an expensive endpoint to hit. Initially we don't expect to do any caching, though if we see the load becoming problematic, we're likely to do basic caching for a period of 10s or so. The fastest polling rate we're likely to recommend is 15s.
I would love to have the API and the tool lxc top for a single lxd instance or multiple ones.
Need all "lxc-top" gave me: