CHPC 2024 Student Cluster Competition

Welcome the Center for High Performance Computing (CHPC)'s Student Cluster Competition (SCC) - Team Selection Round. This round requires each team to build a prototype multi-node compute cluster within the National Integrated Cyber Infrastructure Systems (NICIS) virtual compute cloud (described below).

The goal of this document is to introduce you to the competition platform and familiarise you with some Linux and systems administration concepts. This competition provides you with a fixed set of virtual resources, that you will use to initialize a set a set of virtual machines instances based on your choice or flavor of Linux.

Structure of the Competition
Deliverables
Lecture Recordings
Contributing to the Project
1. Steps to follow when editing existing content
2. Syntax and Style

Structure of the Competition

The CHPC invites applications from suitably qualified candidates to enter the CHPC Student Cluster Competition. The CHPC Student Cluster Competition gives undergraduate students at South African universities exposure to the High Performance Computing (HPC) Industry. The winning team will be entered into the ISC Student Cluster Competition hosted at the 2025 International Supercomputing Conference held in Hamburg, Germany.

You will be accessing all of the course work and material through this GitHub repository, which you and your team must check regularly to receive updates.

Getting Help

You are strongly encouraged to get help and even assist others by Opening and Participating in Discussions.

[!TIP] Active participation in the student discussions is an easy way to separate yourselves from the rest of the competition and make it easy for the instructors to notice you!

Timetable

Everyday will comprise of four lectures in the mornings and tutorials taking place in the afternoons. A PDF Version of the Timetable is available for you to download.

Timetable.

Scoring

Teams will be evaluate according to the following breakdown, with your progress in the tutorials and your final presentations carrying the most weight.

Component	Weight

Technical Knowledge Assessment	0.1
Tutorials	0.4
Cluster Design Assignment (Part 1)	0.1
Cluster Design Presentation	0.4

Instructions for Mentors

The role of mentors, instructors and volunteers is to provide leadership and guidance for the student competitors participating in this year's Center for High Performance Computing 2024 Student Cluster Competition.

In preparing your teams for the competition, your main goal is to ensure that you teach and impart knowledge to the student participants in such a way that they are empowered and enable to tackle the problems and benchmarking tasks themselves.

Hands-Off Rule (You may not touch the keyboard)

Under no circumstances whatsoever may mentors touch any competition hardware belonging to either their team, or the competition hardware of another team. Mentors are encouraged to provide guidance and leadership to their (as well as other) teams.

Any mentors found to be directly in contravention of this rule, may result in their team incurring a penalty. Repeated infringements may result in possible disqualification of their team.

Cheat Sheet

Below is a table with a number of Linux system commands and utilities that you may find useful in assisting you to debug problems that you may encounter with your clusters. Note that some of these utilities do not ship with the base deployment of a number of Linux flavors, and you may be required to install the associated packages, prior to making use of them.

Command	Description
ssh	Used from logging into the remote machine and for executing commands on the remote machine.
scp	SCP copies files between hosts on a network. It uses ssh for data transfer, and uses the same authentication and provides the same security as ssh.
wget / curl	Utility for non-interactive download of files from the Web.It supports HTTP, HTTPS, and FTP protocols.
top / htop / btop	Provides a dynamic real-time view of a running system. It can display system summary information as well as a list of processes or threads.
screen / tmux	Full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).
ip a	Display IP Addresses and property information
dmesg	Prints the message buffer of the kernel. The output of this command typically contains the messages produced by the device drivers
watch	Execute a program periodically, showing output fullscreen.
df -h	Report file system disk space usage.
ping	PING command is used to verify that a device can communicate with another on a network.
lynx	Command-line based web browser (more useful than you think)
ctrl+alt+[F1...F6]	Open another shell session (multiple ‘desktops’)
ctrl+z	Move command to background (useful with ‘bg’)
du -h	Summarize disk usage of each FILE, recursively for directories.
lscpu	Command line utility that provides system CPU related information.
lstotp	View the topology of a Linux system.
inxi	Lists information related to your systems' sensors, partitions, drives, networking, audio, graphics, CPU, system, etc...
hwinfo	Hardware probing utility that provides detailed info about various components.
lshw	Hardware probing utility that provides detailed info about various components.
proc	Information and control center of the kernel, providing a communications channel between kernel space and user space. Many of the preceding commands query information provided by proc, i.e. `cat /proc/cpuinfo`.
uname	Useful for determining information about your current flavor and distribution of your operating system and its version.
lsblk	Provides information about block devices (disks, hard drives, flash drives, etc) connected to your system and their partitioning schemes.

Deliverables

You will need to submit the following for scoring and evaluation by the judges:

Cluster Design Assignment (Part 1) [10 %]
Cluster Design Assignment (Part 2) [40 %]
- One PDF Presentation Slide with Team Profiles This slide must clearly indicate your Team Name and Institution. Below each team member's photograph, indicate their
- Name and surname,
- Degree and Year of study,
- Presentation Slides
- Short Technical Brief with Cluster Design Specifications
Technical Knowledge Assessment [10 %]
Tutorials [40 %]

Cluster Design Assignment

You are tasked with designing a small cluster, with at least three nodes, to the value of R 400  000.00 (ZAR) and present your design to the judging panel. In your design you must specify hardware and software for an operational cluster and describe how it functions. The design must be based on servers and interconnects from either HPE or Dell, and accessories from either NVIDIA, or AMD or Intel. You must use the prices you find in the Parts List Spreadsheet.

The primary purpose of your HPC cluster is to run one of the following codes as efficiently as possible:

You are not given a choice regarding the application selection. Your team will be told which application to optimize for on Wednesday. For now, you should investigate the codes above to understand their unique hardware and software requirements. You are required to submit a brief (half page) report on your findings to the competition organizers by 23:00 on Tuesday.

In addition, your choice of design must take into consideration:

Base Platform (Server),
Target Processing Unit (CPU / GPU),
Memory, Networking and Storage Requirements,
System and Application Dependency Software Requirements,
Ease of Use (Build, Assembly, Deployment),
Efficiency, Performance, Power Consumption and Reliability and
Team Management, Coordination and Planning.

[!IMPORTANT] You may submit an additional design, that extends upon your small R 400 000.00 cluster, up to the value of R 1 000 000.00. You may use any of the above links for this exercise, using a Dollar to Rand conversion rate or 1:20. You may use GPU's from either AMD or NVIDIA. You may utilize CPUs from either AMD or Intel. You may use either Dell or HPE as a vendor.

The 10 minute slide presentation by the whole team must include your design decisions and the features of your cluster, including: cost, hardware, software, configuration and operation. Each member of the team is required to present even though you will be assessed as a team.

After the presentation the judging panel will have an opportunity to ask questions to each member of your team. All members of your team can be questioned about any part of the cluster, so make sure you are fully familiar with the design.

Technical Knowledge Assessment

Each Team must work together to answer and complete the Technical Knowledge Assessment to the best of their ability. Team Captains must email your findings to the organizers no later than 23:00 13th July. You are required to demonstrate your understanding of the concepts in YOUR OWN WORDS. Keep your answers succinct and to the point. Your answers to each of the questions, should not exceed more than 2-3 lines.

Tutorials

You will be evaluated on your overall progress in the tutorials. Below you will find an overview, glossary and high level breakdown of the tutorials. You must progress through four tutorials, which will be released daily. Your overall progress through the tutorials forms a large component of you score. By the end of the week you would have covered a considerable amount of content, use the links provided should you need to refer to a specific section and are having trouble remembering where is it.

Tutorial 1 deals with introducing concepts to users and getting them started with using the virtual lab, standing up the first virtual machine instance and connecting to it remotely. The content is as follows:

Tutorial 2 will demonstrate how to configure and stand-up a compute node, and access it using a transparently created, port forwarding SSH tunnel between your workstation and your head node. You will then install a number of critical services across your cluster.

Tutorial 3 will demonstrate how to configure, build, compile and install a number of various system software and applications. You will also be building these applications with different tools. Finally, you will learn how to run applications across your cluster.

Tutorial 4 demonstrates how to configure docker containers to deploy a monitoring stack, comprising of a metrics database service, an exporting / scraping service and a metric visualization services. You will then learn the very basics of how to visualize and interpret data. You will then learn how to automate the deployment of your Sebowa OpenStack infrastructure. Lastly, you'll deploy a scheduler and submit a job to it.

In this section you will finds links to all of the livestreams of the lectures (Teams Meetings) and subsequent recordings for you to refer back to.

Welcome, Introduction and Getting Started
HPC Hardware, HPC Networking and Systems Administration
Benchmarking, Compilation and Parallel Computing
- [HPC Benchmarking]()
- [Code Compilation]()
- [Parallel Computing and Intro to QC]()
- [Applications: LAMMPS & Qiskit]()
Administration and Application Visualization
- [Cluster Admin, Ansible & Containers]
- [Monitoring]()
- [Schedulers]
- [Data Visualization & Jupyter Lab]
Career Guidance
- [HPC Career Panel]()

Contributing to the Project

[!IMPORTANT] While we value your feedback, the following sections are primarily targeted as Contributors to the Project. As a student participating in the competition, do NOT spend your time working through any of the material below. However, we would love to have your contributions to the project, after the competition.

You are strongly encouraged to contribute and improve the project by Opening and Participating in Discussions, Raising, Addressing and Resolving Issues. The following guide describes How to clone, push, and pull with git (beginners GitHub tutorial).

Steps to follow when editing existing content

In order to effectively manage the various workflows and stages of development, testing and deployment, the project is comprised of three primary branches:

main: Stable and production-ready deployment branch of the project.
stag: Staging branch which mirrors production and is used for integration testing of new features.
dev: Development branch for incorporating new features and bug fixes.

Editing the content directly, will require the use of Git. Using a terminal application or Git for Windows PowerShell or Git for MobaXTerm.

Generate an SSH Key (or use an existing one).
Add your SSH key to your Git profile.
- Navigate to your 'Profile' and go to 'Settings'.
- Under 'Access', navigate to 'SSH and GPG Keys'
git clone a local copy of the repository, to your personal work space.

You can copy the command from GitHub itself.
```
git clone git@github.com:chpc-tech-eval/chpc24-scc-nmu.git
```
When starting work on a new feature or bug fix, create a feature branch off of the development branch and regularly get updates from dev to ensure that you remain consistent with any changes to dev:
```
git checkout dev
git pull origin dev
```
Create a new branch to work on. i.e. git branch tutX/bugfix-or-new-feature followed by git checkout tutX/bugfix-or-new-feature, or simply use a single command git checkout -b tutX/bugfix-or-new-feature.
- Give the branch a sensible name.
- You are encouraged to push the branch back to the remote so that collaborators can see what you are working on as you make the changes.

Make the appropriate changes and commit them locally:

git add <relative_path_to_changed_file(s)>
git commit -m "some_message_pertaining_to_changes_made"

When you have completed editing your feature, merge any remote changes from dev and then push your local changes, back upstream to the remote repository:

git pull origin dev # (optional) it is generally a good practice to incorporate any changes in dev into your code early and often
git pull origin feature/bugfix-or-new-feature # (optional) if you are collaborating on a specific feature with someone, it is important to incorporate their changes early and often
git push origin feature/bugfix-or-new-feature

Once you are satisfied with the changes you've have been editing, eliminate all merge conflicts by pulling all remote changes and deviations into your local working copy. git pull.
- If you are confident that your feature does not or has not deviated from the remote dev branch, use git pull to automatically fetch and merge remote changes from dev into your feature branch.
- Alternatively, if your branch is old, or depends on / requires changes from remote use git fetch, to fetch remote changes and be able to preview them before merging.
- Eliminate your local conflicts and merge all remote changes git merge.
- Once all the conflicts have been resolved, and you've successfully merged all remote changes, push your branch upstream.
Create a pull request to the remote dev branch on GitHub, to incorporate your feature.
- Or another branch, if your feature branch was adding functionality to an existing feature branch.

Syntax and Style

Use the following guide on Github Markdown Syntax Editing.

chpc-tech-eval / chpc24-scc-nmu

readme