What exactly is limiting grab() in terms of cpu usage?

As always, when asking about library flaws / anything of that nature, I like to start by pointing out how much I appreciate the work that has gone into making this toolkit. Working with this library has made the project I'm working on much easier and I've very much enjoyed looking through the work done here.

General information:

OS name: Windows 10
OS version: ---
OS architecture: 64 bits
Resolutions:
- Monitor 1: 1920x1080
- Monitor 2: 1920x1080
- Monitor 3: 1920x1080
- Montior 4: 1920x1080
Python version: _3.8
MSS version: 3.2.0

Full message

The goal and problem:

I'd like to only capture the borders of all of my monitors, say, the edges plus a 5 or so pixel margin. I'm struggling to understand why calling a screenshot of the entire virtual monitor is faster than calling each of the monitors and cropping them, which is in turn even faster than just calling grab() on each individual sub-region.

So to this I ask: How might I maximize speed and minimize CPU usage if my goal is only to get the borders of all of my monitors?

Testing methodology

Forgive the poor method order, but the following methods achieve these performances on my machine. Method 3 is used as a control to test if it's downstream processing or the grab() method that's causing problems, as we're comparing methods that vary the number of calls to grab() and the corresponding necessary calls to cropping. However, for the (very crude) benchmark/mwe below, I omit cropping done by numpy and PIL for the sake of simplicity and as it's several orders of magnitude faster than any single speed difference observed here.

Method one: Get the entire virtual monitor and crop 16 borders out of it, 166ms/4 monitors Method two: Get four individual monitor screenshots, then crop four borders out of them: 196ms/4 monitors Method three: Copy method one, but instead of frombytes use frombuffer to test how much allocating changes time: 169ms/4 monitors Method four: Get 16 individual screenshots of the borders, which means cropping is entirely unnecessary: 267ms/4 monitors

Other details

MWE for testing (you may need to remove / adjust the benchmarks if you don't have 4 monitors)

from PIL import Image
import time

import mss
import mss.tools
import mss.windows
mss.windows.CAPTUREBLT = 0
sct = mss.mss()

def get_master_screenshot():
    """Get a screenshot of the overall virtual screen."""
    complete_screengrab = sct.grab(sct.monitors[0])
    complete_screenshot = Image.frombytes("RGB", complete_screengrab.size, complete_screengrab.bgra, "raw", "BGRX")
    return complete_screenshot

def get_master_screenshot_buffer():
    """Get a screenshot of the overall virtual screen, but improve things slightly by not allocating."""
    complete_screengrab = sct.grab(sct.monitors[0])
    complete_screenshot = Image.frombuffer("RGB", complete_screengrab.size, complete_screengrab.bgra, "raw", "BGRX")
    return complete_screenshot

def time_for_all_screenshots_1():
    times = []
    for _ in range(50):
        starttime = time.time()
        get_master_screenshot()
        t = time.time() - starttime
        times.append(t)
    return sum(times) / len(times)

get_master_screenshot().show()

def get_individual_screenshot(monitor_id):
    complete_screengrab = sct.grab(sct.monitors[monitor_id])
    complete_screenshot = Image.frombytes("RGB", complete_screengrab.size, complete_screengrab.bgra, "raw", "BGRX")
    return complete_screenshot

# used to access only individual borders
LOCAL_TOP = (
    0, 
    0, 
    sct.monitors[1]["width"], 
    50
)
LOCAL_BOTTOM = (
    0, 
    sct.monitors[1]["height"] - 50, 
    sct.monitors[1]["width"], 
    sct.monitors[1]["height"]
)
LOCAL_LEFT = (
    0, 
    0, 
    50, 
    sct.monitors[1]["height"]
)
LOCAL_RIGHT = (
    sct.monitors[1]["width"] - 50, 
    0, 
    sct.monitors[1]["width"], 
    sct.monitors[1]["height"]
)

def get_four_boundary_screensots():
    im1 = sct.grab(LOCAL_TOP)
    im2 = sct.grab(LOCAL_BOTTOM)
    im3 = sct.grab(LOCAL_LEFT)
    im4 = sct.grab(LOCAL_RIGHT)
    scrsht1 = Image.frombuffer("RGB", im1.size, im1.bgra, "raw", "BGRX")
    scrsht2 = Image.frombuffer("RGB", im2.size, im2.bgra, "raw", "BGRX")
    scrsht3 = Image.frombuffer("RGB", im3.size, im3.bgra, "raw", "BGRX")
    scrsht4 = Image.frombuffer("RGB", im4.size, im4.bgra, "raw", "BGRX")
    return scrsht1, scrsht2, scrsht3, scrsht4

def time_for_all_screenshots_2():
    times = []
    for _ in range(50):
        starttime = time.time()
        get_individual_screenshot(1)
        get_individual_screenshot(2)
        get_individual_screenshot(3)
        get_individual_screenshot(4)
        t = time.time() - starttime
        times.append(t)
    return sum(times) / len(times)

def time_for_all_screenshots_3():
    times = []
    for _ in range(50):
        starttime = time.time()
        get_master_screenshot_buffer()
        t = time.time() - starttime
        times.append(t)
    return sum(times) / len(times)

def time_for_all_screenshots_4():
    times = []
    for _ in range(50):
        starttime = time.time()
        get_four_boundary_screensots()
        get_four_boundary_screensots()
        get_four_boundary_screensots()
        get_four_boundary_screensots()
        t = time.time() - starttime
        times.append(t)
    return sum(times) / len(times)

print("time for method 1: ", time_for_all_screenshots_1())
print("time for method 2: ", time_for_all_screenshots_2())
print("time for method 3: ", time_for_all_screenshots_3())
print("time for method 4: ", time_for_all_screenshots_4())

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

BoboTiG / python-mss