canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.31k stars 928 forks source link

Add a project limits consumption API and CLI #7946

Closed stgraber closed 3 years ago

stgraber commented 3 years ago

Projects have a set of limits that can be applied to them, but users can't easily tell how far along they are in consuming those resources.

We should add a basic API under /1.0/projects/NAME to query the current usage for a project and add a lxc project info CLI option to display it.

komish commented 3 years ago

Am I understanding that the information for the limits already exists, and this feature would simply allow querying and presenting that data? I could work on this, just need to get more familiar with things.

stgraber commented 3 years ago

That's right. The limits are enforced right now so there is existing logic to query the limit and its current usage.

stgraber commented 3 years ago

Assigned it to you.

stgraber commented 3 years ago

The limits currently available are:

The API endpoint to return the limits for a project should:

For networks, just load the list of networks in the project and that's going to be your count.

The tricky one of those is limits.diskwhich is the sum of:

The existing query logic isn't exactly ideal for this, but we do have existing logic to do everything I described in this comment. So best may be to just add an API endpoint and just have a function which pulls all the information needed above.

As for the API, GET /1.0/projects/NAME/state would be consistent with what we've done in other places.

I'd expect roughly this set of commits:

komish commented 3 years ago

Thanks for the info. Very much appreciated. It'll likely take me some time and I'm sure to have questions. For starters:

I read this in the documentation:

Note that to be able to set one of the limits.* config keys, all instances in the project must have that same config key defined, either directly or via a profile.

Instances created before a project has a limit defined in config don't seem to have this restriction. That is - they continue to run without the config key. API requests to create instances were met with an error if they didn't contain the config key, so I would guess that the check to make sure an instance in a project is within the upper bound only really takes place at creation time. Does that sound correct?

It makes sense (stopping a workload because it doesn't have a config key setting a limit after the fact could be dangerous, etc). This could impact the accuracy of what's represented by the project limits consumption API, and I want to make sure there's nothing special I need to take into consideration as a result.

stgraber commented 3 years ago

Yeah, that's correct. It's a pretty unusual situation as we normally expect the admin to set the limit prior to anything getting created, but it's certainly a possible situation.

Similarly, it's possible for an admin to lower the project limits after things got created, effectively giving you a situation where the reported usage is higher than the limit. In such a situation, nothing new can get created until the user has done some cleanup.

komish commented 3 years ago

Sorry for the delay in any kind of update, I've been a bit short on time. I know this is listed as a Hacktoberfest issue but if it's alright with the project, I'll likely find more time to make progress on this after the event ends.

stgraber commented 3 years ago

Yep, that's fine with us :)

stgraber commented 3 years ago

@komish still looking into this?

komish commented 3 years ago

@stgraber I haven't had a chance to come back to it yet :(. I got as far as provisioning a fresh machine in the home lab. My apologies, may be best to unassign me to free it up for others and/or if this is moving up in the priority.

stgraber commented 3 years ago

We're planning to have this in a release by mid April so there's still some time for you to take a stab at it :)

komish commented 3 years ago

Just dropping a quick note that I'll be focused on this in the coming weeks!

stgraber commented 3 years ago

Excellent!

komish commented 3 years ago

Hello again! Looking for some guidance here whenever you have a moment. @stgraber

EDIT: apologies for the length

I've been following the skeleton you provided earlier in this thread as closely as possible to help wrap my head around things.

At this point, I'll looking at the API implementation which seems to be the bulk of the work so far. Specifically, this bullet from that skeleton you provided above.

At the moment, I'm working with the assumption that the endpoint to get Project State will effectively mirror lxd/api_project.go:ProjectGet(...) which effectively gets a project using a Name as a filter. I've copied the the associated type api.Project to an api.ProjectState for the time being (nothing else changed re: the type for the moment).

Following the logic, I see a call to d.cluster.GetProject(name) https://github.com/lxc/lxd/blob/master/lxd/api_project.go#L199. I intended on adding another method for the Project State on the Cluster type to be called by the upstream ProjectStateGet in lxd/api_project.go

If I look at the methods on the Cluster type, (lxd/db/projects.go), I see another method, this time on the ClusterTx type https://github.com/lxc/lxd/blob/master/lxd/db/projects.go#L199 which handles the logic necessary to perform a query to the database for the information that is needed to build a response.

The methods bound to ClusterTx, however, seem to be generated (or, perhaps more accurately, the lxd/db/projects.mapper.go contains comments indicating it's generated by lxd-generate).

With this in mind, I'm not intending on editing this file, but if I need to add another method to ClusterTx it seems like I need to identify how lxd-generate makes that file. Based on context of your earlier guidance, I gathered that the function calls already exist in other places to gather the resources needed to count existing states of limits, and I can query applied profiles for the max values. With that in mind, it makes sense that I wouldn't have to add any query strings to this file to get what I need.

With that in mind - does that all sound like it's on the right track? Do I need to get familiar with lxd-generate to implement the logic necessary to return []api.ProjectState from a method bound to *ClusterTx?

Thanks again. The guidance has been invaluable!

P.S. at some point I may ping for information on getting the project to build nicely. Specifically seeing compiler errors around the Package C - but since it's not in scope for what I'm trying to get accomplished at the moment, we can leave those questions for a future message.

P.P.S. for folks using Go 1.16 - Module mode is enabled by default regardless of the presence of a go.mod file in your source. It took me longer than it should have to identify that my dependency installation wasn't working as a result of this. User error on my part.

stgraber commented 3 years ago

The goal is to implement a new /1.0/projects/NAME/state endpoint.

Doing so requires the addition of a new ProjectState struct in shared/api/project.go followed by a new APIEndpoint struct in lxd/api_project.go which will implement Path "projects/{name}/state" and have only a Get function associated with it. This new APIEndpoint then needs to be added to the list in api_1_0.go.

That Get function then needs to pull usage data on:

I suspect the best there will be to put the bulk of the logic in a new lxd/project/state.go file so you can reuse the same functions from lxd/project/permissions.go which currently do the limit enforcement.

I don't believe you'll need to add any database function for this. We already have the limits in place and so must already have a way to know how much is used.

komish commented 3 years ago

Just a quick update that I'm making progress. I've got the instance counts done and the API responding (via curl), and am searching and piecing together functions that give me current aggregate values and the remaining states to include in the response. Haven't forgotten this issue, just slowly making progress :+1: . Trying to get it submitted before your release goal!

stgraber commented 3 years ago

Cool! As mentioned above, there is logic to compute all the totals in lxd/project/ as that's already used for enforcement, so will likely just need a bit of refactoring in some spots and then re-using for this API endpoint.