SELinuxProject / selinux-kernel

GitHub mirror of the SELinux kernel repository
https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
Other
148 stars 56 forks source link

RFE: enable changing the number of AVC hash buckets at runtime #34

Open pcmoore opened 7 years ago

pcmoore commented 7 years ago

At present the number of AVC hash buckets is hard coded to 512, we should look into making this tunable at runtime. While 512 buckets tends to work well for most workloads, it is proving to be too small for systems with a large number of unique labels such as container hosts using MCS/sVirt.

pcmoore commented 7 years ago

I'm not sure this calls for something like the kernel's lib/rhashtable.c implementation, since the AVC is a cache, and I expect size adjustments to be rare, we can probably get away with throwing out the old table and replacing it with a new, empty table.

stephensmalley commented 7 years ago

Do you really need to change the number of buckets, or just the threshold/max number of cache entries? The latter can already be tuned via /sys/fs/selinux/avc/cache_threshold. Do we have some data, e.g. cat /sys/fs/selinux/avc/hash_stats, from these systems?

pcmoore commented 7 years ago

I'm hearing of systems that have bumped the threshold up to ~65k and are hitting that limit, the resulting lengthy per-bucket chains are causing spikes in CPU usage in avc_has_perm().

stephensmalley commented 7 years ago

Why would we end up with that many unique AVC entries? Most container accesses would be within the same category set (i.e. intra-container) and to a handful of types (mostly container or svirt types). So they shouldn't yield that many unique (source context, target context, target class) triples.

pcmoore commented 7 years ago

Imagine thousands of containers on a single system.

stephensmalley commented 6 years ago

Even with thousands of containers, most accesses should be intra-container, so I wouldn't expect that many unique AVC entries; AVC entries are only ever created for actual permission checks, not potential ones. That said, given the number of unique security classes, I could see a definite multiplying factor to just represent a container's access to all file classes, many socket classes, etc. That's another area for possible improvement, i.e. allowing a single AVC entry and security server computation to represent multiple classes so that if the same permissions are allowed to e.g. all file classes, we can store that once in the AVC.

rhatdan commented 6 years ago

Most people will never run more then 100 containers. Eventually we might scale beyond 100, but I think on OpenShift right now, we are only handling ~50 containers. So this would be 50 Process types and 50 object types (Maybe a few more)

jeremyeder commented 6 years ago

Agreed with the most people comment. OpenShift supports up to 250 per node right now, and going to try and double that by fall of this year. We currently have closing in on 100 per node in a variety of environments though. 100 is pretty common.

stephensmalley commented 6 years ago

Then I don't see why we'd be increasing the AVC cache threshhold to 64k; that's just making the cache slow for no benefit.