We propose HandyPose, a single-stage network for hand pose estimation that is end to end trainable and produces state-of-the-art results. HandyPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. To deal with the challenges of hand pose context and resolution, our architecture generates improved multi-scale and multi-level representations by combining features from multiple levels of the backbone network via our advanced Multi-Level Waterfall module.