Distinguishes between debugging (working with a codebase into which the debugger has full visibility) and troubleshooting (investigating and resolving problems in complex systems where the troubleshooter may not have full visibility or knowledge).
2. General Troubleshooting Methods
Outlines general approaches to troubleshooting, including defining the problem, understanding the request path, bisecting the problem space, and generating and proving/disproving hypotheses.
3. Scenarios
Presents two real-world troubleshooting scenarios (Slack Client Crashes and Broken Load Balancers) to illustrate the troubleshooting process and methods.
4. Tools
Lists various system-specific and general-purpose tools that can be used for troubleshooting, such as Linux OS-level tooling, TCP packet dumps, application-specific counters, and logging.
5. Related Reading
Provides links to additional resources and tutorials related to troubleshooting and the tools mentioned in the document.
Key Things To Learn
The difference between troubleshooting and debugging
General troubleshooting methods, including:
Defining the problem
Understanding the request path
Bisecting the problem space
Generating and proving/disproving hypotheses
How to examine possible causes and test solutions
The concept of bottlenecks and how they affect system performance
The USE (Utilisation, Saturation, Errors) method for troubleshooting performance problems
How to analyze real-world troubleshooting scenarios to understand the troubleshooting process and methods
Familiarity with basic Linux tooling for troubleshooting, such as:
Troubleshooting Primer
https://systems.codeyourfuture.io/primers/troubleshooting/
Sections:
1. Troubleshooting Versus Debugging
2. General Troubleshooting Methods
3. Scenarios
4. Tools
5. Related Reading
Key Things To Learn