Limits (quotas) for projects

stgraber commented 5 years ago

Projects as they are today are a great way to segment a LXD host, having its own containers, images and profiles (depending on configuration). Combined with RBAC, it's then possible to have some users/groups only have access to specific projects.

But right now, access to a project means you still get to create as many containers as you want, use as much CPU and memory as you want, potentially running the host system out of resources.

We shouldn't try to fix all of this in one shot, but need to put some infrastructure in place for such limits to be implemented. The easiest would be to use the existing project configuration mechanism, adding more configuration keys like:

limits.containers (total number of containers that can be held by the project)
limits.cpu (maximum number of CPUs that can be assigned to a container OR range of CPUs that may be used)
limits.memory (total amount of memory usable by the project)
limits.processes (total number of processes usable by the project)

I suggest we do this initial set as a way to prove the concept, we can then add more as we see demand.

rcash commented 5 years ago

Hi @stgraber I'm a student in UT Austin's virtualization class. Does this issue still have a deadline for the end of October? I'm also curious about this for #6170

stgraber commented 5 years ago

@rcash Hi, in theory, yes, but we're obviously not going to make it, so those two issues will be pushed to the next development cycle for us and so we'd likely want them done by February at this point.

stgraber commented 5 years ago

I've now updated the forum to reflect that we won't deliver those two at the planned time and moving the deadline on them to end of January instead.

rcash commented 5 years ago

Awesome, my group is pretty interested in this one but we're pending instructor approval currently.

stgraber commented 5 years ago

Excellent, just let me know when you have the go ahead and get the rest of your group to comment so I can assign it to all of you.

rcash commented 5 years ago

Great, we have the go ahead! Could you assign @mparfan @Lay-ton and I? Thank you!

stgraber commented 5 years ago

Assigned it to you, I'll need @mparfan and @Lay-ton to comment before I can assign it to them too.

Lay-ton commented 5 years ago

Hello, could you go ahead and assign me

mparfan commented 5 years ago

Hello, sorry for the late response but could you also go ahead and assign me

stgraber commented 5 years ago

done!

rcash commented 4 years ago

Hi again @stgraber, are lxc projects enabled by default or is there a setting we need to tweak to get them to work? I've been building LXD from source on the master branch and project never appears as a valid option when I type lxc when it is running. Some other LXD groups in the virtualization class here have also not been able to use lxc projects either. I've attached the output of lxc, lxc project and lxc info below.

root@cs378-opensourceproject-testserverv2 ~ $ lxc Description: Command line client for LXD

All of LXD's features can be driven through the various commands below. For help with any of those, simply call them with --help.

Usage: lxc [command]

Available Commands: alias Manage command aliases cluster Manage cluster members config Manage container and server configuration options console Attach to container consoles copy Copy containers within or in between LXD instances delete Delete containers and snapshots exec Execute commands in containers file Manage files in containers help Help about any command image Manage images info Show container or server information launch Create and start containers from images list List containers move Move containers within or in between LXD instances network Manage and attach containers to networks operation List, show and delete background operations profile Manage profiles publish Publish containers as images remote Manage the list of remote servers rename Rename containers and snapshots restart Restart containers restore Restore containers from snapshots snapshot Create container snapshots start Start containers stop Stop containers storage Manage storage pools and volumes version Show local and remote versions

Flags: --all Show less common commands --debug Show all debug messages --force-local Force using the local unix socket -h, --help Print help -v, --verbose Show all information messages --version Print version number

Use "lxc [command] --help" for more information about a command. root@cs378-opensourceproject-testserverv2 ~ $

root@cs378-opensourceproject-testserverv2 ~ $ lxc project Error: unknown command "project" for "lxc" Run 'lxc --help' for usage. root@cs378-opensourceproject-testserverv2 ~ $

root@cs378-opensourceproject-testserverv2 ~ $ lxc info config: {} api_extensions:

storage_zfs_remove_snapshots
container_host_shutdown_timeout
container_stop_priority
container_syscall_filtering
auth_pki
container_last_used_at
etag
patch
usb_devices
https_allowed_credentials
image_compression_algorithm
directory_manipulation
container_cpu_time
storage_zfs_use_refquota
storage_lvm_mount_options
network
profile_usedby
container_push
container_exec_recording
certificate_update
container_exec_signal_handling
gpu_devices
container_image_properties
migration_progress
id_map
network_firewall_filtering
network_routes
storage
file_delete
file_append
network_dhcp_expiry
storage_lvm_vg_rename
storage_lvm_thinpool_rename
network_vlan
image_create_aliases
container_stateless_copy
container_only_migration
storage_zfs_clone_copy
unix_device_rename
storage_lvm_use_thinpool
storage_rsync_bwlimit
network_vxlan_interface
storage_btrfs_mount_options
entity_description
image_force_refresh
storage_lvm_lv_resizing
id_map_base
file_symlinks
container_push_target
network_vlan_physical
storage_images_delete
container_edit_metadata
container_snapshot_stateful_migration
storage_driver_ceph
storage_ceph_user_name
resource_limits
storage_volatile_initial_source
storage_ceph_force_osd_reuse
storage_block_filesystem_btrfs
resources
kernel_limits
storage_api_volume_rename
macaroon_authentication
network_sriov
console
restrict_devlxd
migration_pre_copy
infiniband
maas_network
devlxd_events
proxy
network_dhcp_gateway
file_get_symlink
network_leases
unix_device_hotplug
storage_api_local_volume_handling
operation_description
clustering
event_lifecycle
storage_api_remote_volume_handling
nvidia_runtime
container_mount_propagation
container_backup
devlxd_images
container_local_cross_pool_handling
proxy_unix
proxy_udp
clustering_join
proxy_tcp_udp_multi_port_handling
network_state
proxy_unix_dac_properties
container_protection_delete
unix_priv_drop
pprof_http
proxy_haproxy_protocol
network_hwaddr
proxy_nat
network_nat_order
container_full
candid_authentication
backup_compression
candid_config
nvidia_runtime_config
storage_api_volume_snapshots
storage_unmapped
projects
candid_config_key
network_vxlan_ttl
container_incremental_copy
usb_optional_vendorid
snapshot_scheduling
container_copy_project
clustering_server_address
clustering_image_replication
container_protection_shift
snapshot_expiry
container_backup_override_pool
snapshot_expiry_creation
network_leases_location
resources_cpu_socket
resources_gpu
resources_numa
kernel_features
id_map_current
event_location
storage_api_remote_volume_snapshots
network_nat_address
container_nic_routes
rbac
cluster_internal_copy
seccomp_notify
lxc_features
container_nic_ipvlan
network_vlan_sriov
storage_cephfs
container_nic_ipfilter
resources_v2
container_exec_user_group_cwd
container_syscall_intercept
container_disk_shift
storage_shifted
resources_infiniband
daemon_storage
instances
image_types
resources_disk_sata
clustering_roles
images_expiry
resources_network_firmware
backup_compression_algorithm
ceph_data_pool_name
container_syscall_intercept_mount
compression_squashfs
container_raw_mount
container_nic_routed
container_syscall_intercept_mount_fuse
container_disk_ceph api_status: stable api_version: "1.0" auth: trusted public: false auth_methods:
tls environment: addresses: [] architectures:
- x86_64
- i686 certificate: | -----BEGIN CERTIFICATE----- MIICbTCCAfKgAwIBAgIRAN0AnpQ07SsH7A2JiPZgYwQwCgYIKoZIzj0EAwMwUjEc MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEyMDAGA1UEAwwpcm9vdEBjczM3 OC1vcGVuc291cmNlcHJvamVjdC10ZXN0c2VydmVydjIwHhcNMTkxMTIxMjM0NjA2 WhcNMjkxMTE4MjM0NjA2WjBSMRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3Jn MTIwMAYDVQQDDClyb290QGNzMzc4LW9wZW5zb3VyY2Vwcm9qZWN0LXRlc3RzZXJ2 ZXJ2MjB2MBAGByqGSM49AgEGBSuBBAAiA2IABL836CL6ZCtVL0If0CY4E6aOdipm uqe54zrU1bJfSL3RR4xV4UnivqR1RZBC7WO01uzzRhSMqC73KgUf+CmGl5KR+KRa hCrAaf60h41YIVJnirCBESrAuhghmdcILt65UqOBizCBiDAOBgNVHQ8BAf8EBAMC BaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADBTBgNVHREETDBK giRjczM3OC1vcGVuc291cmNlcHJvamVjdC10ZXN0c2VydmVydjKHBES3OOqHBAoR AAWHBApVIQGHEP1CLSW7UMgnAAAAAAAAAAEwCgYIKoZIzj0EAwMDaQAwZgIxAMyY wMtZMt6GsQhMdhueECHD4zUQApzVvjB1uEta3METDPwHkYS0R4Anh1+/lKq46gIx ALhrDbbJLyKp2yi4MuW9FkE4GXdd0zh2lfF92mizOf8DAHl1AX9LwmzVbd8y7+h/ HA== -----END CERTIFICATE----- certificate_fingerprint: 938e4f668d7114cceb8ae06ff93e1f5253463e25afa6ba32ac05bef23d762341 driver: lxc driver_version: 3.0.3 kernel: Linux kernel_architecture: x86_64 kernel_version: 4.15.0-70-generic server: lxd server_pid: 1646 server_version: "3.18" storage: lvm storage_version: 2.02.176(2) (2017-11-03) / 1.02.145 (2017-11-03) / 4.37.0 server_clustered: false server_name: cs378-opensourceproject-testserverv2 project: default root@cs378-opensourceproject-testserverv2 ~ $

stgraber commented 4 years ago

Make sure you don't have the lxd and lxd-client packages installed on your system. The behavior you're describing would be consistent with you having the 3.0.3 version of lxd-client installed.

mparfan commented 4 years ago

Hi @stgraber, are project limits intended to change when a project is not empty? We ask because as of right now, we think that the functionality of limits is very similar to how features.images or features.profiles are used in lxd/api_project.go, but that a user might want to increase the max number of containers at some point.

stgraber commented 4 years ago

Limits will be modifiable at any time. The most typical change will indeed be an increase, though a decrease should also be possible, but should fail if the current usage exceeds the new limit.

rcash commented 4 years ago

Hi @stgraber , we've implemented container limits and plan to put up a PR for them tomorrow. We're currently working on cpu limits but would like to run a design by you. We're basing our logic off of some of the methods in lxd/profile_utils.go that grab all the currently running containers, update LXD's internal DB and then update each of the containers with the new profile config. From here there is support already for limits (at the container level) so we do not think we need to do more. We currently check to see if the CPU limit has been changed in projectChange(lxd/api_project.go), then use the projectUpdate DB transaction here to update the limit and then change the container limits individually using helpers at the end of projectChange(lxd/api_project.go). From a 10,000 foot view is this a design that would be acceptable, and is there anything we should keep in mind?

freeekanayaka commented 4 years ago

@rcash from your description it's unclear to me what would be the actual logic to, say, make sure that the total memory used by all containers belonging to a project with limits.memory set to 10G does not exceed 10G.

Note that projects should be orthogonal to profiles. In terms of database records, you shouldn't need to change the limits.* keys of any profile when you change the limits.* keys of a project. However, when changing the limits of a profile or a project, you'll have to take into account both of them when calculating the actual limits to apply to a container. That could be done for example in the containerLXC.expandConfig() method of container_lxc.go, which currently takes into account only profiles, but that should be modified to take into account the container's project as well.

Hope what I'm saying makes sense and it's how @stgraber envisioned the feature :) So take all this with a grain of salt, I'll leave to him the final word.

stgraber commented 4 years ago

Correct, so there are two types of limits really. One is purely a DB entry limit, that would apply to:

limits.containers
limits.virtual-machines

Those would be checked for at creation time, if they're exceeded, creation fails.

Then you have the resource limits like (ignore limits.cpu for now, it's a weird one):

limits.memory
limits.processes

Those apply to the running usage so shouldn't be checked against DB records, instead, we'll want them applied through cgroups by changing the way containers are placed into cgroups through lxc.cgroup.dir, effectively changing the default path from /lxc/NAME to /lxd/PROJECT/NAME, letting us apply restrictions to /lxd/PROJECT which will then apply to all containers and vms placed inside it.

Given your deadline around this work and the fact that there is currently no draft PR open for this. I'd suggest you focus on just implementing limits.containers and limits.virtual-machines for now. This will still make you put all the infrastructure in place for limits but will keep you away from the more complex cgroup handling part (which could lead to conflict/rebase as another group is currently working on cgroup abstractions).

rcash commented 4 years ago

We'll have a PR open for those two limits later today (we have implemented limits.containers already) and I plan to follow through with this issue over winter break and get anything unresolved done.

What makes CPU limits weird? I ask because it seemed that it may have been less complex to do than the memory & process limits and we've written a substantial amount of code for cpus already (under the assumption that changing a project's cpu limit would pin whatever cpu or range of cpus to all of the containers running the project).

stgraber commented 4 years ago

So we were chatting about that with @freeekanayaka earlier and our original plan to use cgroups to restrict the number of cpus and memory for a project won't work.

The reason for that is that LXD supports clustering and projects can span multiple cluster nodes, you therefore can't enforce a shared amount of cpu/memory across all containers/vms in a project.

Instead it looks like we'll need to take the same approach that openstack did and effectively make limits.cpu and limits.memory map to the maximum which can be defined rather than used.

This would require all containers and VMs inside a project with such limits in place to themselves define limits.cpu and limits.memory. This would need to have a few more side-effects, specifically:

limits.memory.enforce wouldn't be allowed to be soft
limits.cpu would only allow for a number of CPUs, specific pinning will not be allowed

limits.processes would work in a similar way too. That is, if it's set, then all containers must have a limits.processes too and the total for the project cannot exceed the project limit.

This behavior will need proper documentation as it will lead to some counter-intuitive behavior. For example, say you're using project limits on your laptop, you have 4 physical CPU cores and run 50 containers and want to give access to two cores to each container. You'll have to set limits.cpu on the project to 100 which feels odd as it far exceeds what the hardware has and so doesn't line up with the behavior we've had for the similarly named key on containers.

stgraber commented 4 years ago

@freeekanayaka sounds good?

freeekanayaka commented 4 years ago

@stgraber that sounds good in principle, however it feels indeed odd for thinks like limits.cpu.

An alternative solution we might want to consider is to have project configuration keys like limits.memory and limits.cpu have virtually the same semantics as their profile-level or instance-level equivalents.

If you set the project limits.memory to 1G, that will be the default effective/expanded instance limits.memory config that instances started in the project will have. If a particular instance has also profiles associated to it, and/or has its own limits.memory config key, and the effective/expanded instance limits.memory derived from them exceeds the project's limits.memory, we would fail the start operation (or trim the expanded limits.memory down to the project one).

If an admin wants to limit the total amount of memory that can be ever used by the containers in a project to, say, 10G, he can for example set limits.containers to 10, limits.virtual-machines to 0 and limits.memory to 1G.

Purely as convenience for the user, we might add "read-only" project configuration keys such as limits.total-memory, where we would do the math for you.

It might be a bit less "magic" and involve a tiny bit of arithmetic on the admin side, however the upside could be that it's easier to understand and closer to what the implementation actually does.

stgraber commented 4 years ago

Hmm, I see limits.* as resource limits to apply to the entire project, effectively how you would put an upper bound on what the project can consume on your cluster.

There is value in being able to put an upper bound on a per instance basis, effectively preventing the creation of any instance with more than X GB of memory, but this would count as a restriction and so be more under the scope of #6170

stgraber commented 4 years ago

It's very common for deployments to use varying instance sizes, so if we were to treat limits.memory as an upper restriction per instance, then the following deployment:

c1 with memory=256MB
c2 with memory=256MB
c3 with memory=2GB
c4 with memory=512MB

Would require a limits.memory of 2GB and a limits.containers of 4. Allowing that user to consume up to 8GB of RAM when only 3GB are actually needed.

freeekanayaka commented 4 years ago

It's very common for deployments to use varying instance sizes, so if we were to treat limits.memory as an upper restriction per instance, then the following deployment:
* c1 with memory=256MB

* c2 with memory=256MB

* c3 with memory=2GB

* c4 with memory=512MB
Would require a limits.memory of 2GB and a limits.containers of 4. Allowing that user to consume up to 8GB of RAM when only 3GB are actually needed.

That's true, but what effective limits.memory instance-level value would you assign to a newly created container that does not explicitly specify itself a value for limits.memory (via config key on the container or on a profile)? Or would you just fail the creation of the container in that case?

stgraber commented 4 years ago

I would fail the container startup with an error indicating that the project has specified a memory limit and that therefore all containers must set their own limit.

stgraber commented 4 years ago

So very similar to what openstack does (though I believe it may be doing it at creation time since re-sizing isn't something as commonly done online as it is on LXD).

freeekanayaka commented 4 years ago

Okay, if you fail the container creation, it makes sense. We're mainly left with the odd limits.cpu.

stgraber commented 4 years ago

Yeah, limits.cpu I think would work the same way, we don't allow CPU pinning, we make the project limit mean the total number of virtual CPUs and we require each container and VM to have limits.cpu set to some integer value.

It will feel somewhat weird because of what looks like huge overcommit on the project limit front, but it's not really any different than what we do for the memory. For the memory you may end up setting the project limit to a value higher than is free on the system, knowing that not all containers will use 100% of their memory. The same is true with CPUs, though overcommit here is less dangerous than on the memory front (just makes the system slower rather than trigger OOM/swap).

canonical / lxd

Limits (quotas) for projects #6169