OpenCHAMI / roadmap

Public Roadmap Project for Ochami
MIT License
1 stars 0 forks source link

[RFD] LANL Security Pillars #45

Open stradling opened 2 weeks ago

stradling commented 2 weeks ago

LANL HPC Security Pillars

LANL systems face significant security threats, both insider and outsider. I'm hoping to state the basic considerations for a shared HPC resource and allow others to revise and extend.

Management/Compute isolation

Compute nodes, ideally, will never need to initiate communication with the management plane (comprising the image servers, service nodes, fabric managers, orchestration, etc that allow the HPC system to work as an ensemble). If such communication is unavoidable, it must be strictly separated from any endpoint that could allow an attacker to escalate privileges and act as a domain administrator.

Rapid Updates

Updating regularly for security hygiene or rapidly when vulnerabilities are detected are central to good security in general, and to LANL operations in particular. Any aspect of the HPC environment that prevents rapid testing and rollout of patches to packages or kernel components should be minimized, noted, and eliminated as soon as possible.

Verification

Red-team testing of specialized (small-community) management software, networks, filesystem projection software, and other components that don't get attention from large vendors or communities should be regularly red-teamed by knowledgable internal developers with an assumption of root access on user-facing nodes.


What else do we need in here?

stradling commented 2 weeks ago

This was per Alex's request last week.