bkaradzic / bgfx

Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library.
https://bkaradzic.github.io/bgfx/overview.html
BSD 2-Clause "Simplified" License
15.15k stars 1.95k forks source link

Incorrect VRAM reporting on Metal and DX12-UWP #2693

Open Ravbug opened 2 years ago

Ravbug commented 2 years ago

Describe the bug bgfx::Stats::gpuMemoryMax and bgfx::Stats::gpuMemoryUsed are always -9223372036854775807 on Metal and DX12-UWP

To Reproduce Steps to reproduce the behavior:

  1. Initialize the Metal backend
  2. Call bgfx::frame() at least once (may not be necessary)
  3. Call bgfx::getStats() then look at gpuMemoryMax and gpuMemoryUsed

Expected behavior It should return the max VRAM of the GPU, and the VRAM in use.

Screenshots Metal:

image

DX12-UWP: image

DX12-Non UWP image

Additional context OS: macOS 11.6.2 Xcode 13.2.1 GPU: AMD Radeon M395X

OS: Windows 10 21H1 Visual Studio 2022 GPU: NVIDIA RTX 2070 Super Driver: 456.71

bkaradzic commented 2 years ago

UWP is dead, so this is non-issue.

Do you have suggestion how to query / find out amount of GPU memory on Metal?

Ravbug commented 2 years ago

For getting the current allocation size, it looks like currentAllocatedSize in MTLDevice will work.

For total amount of GPU memory, This note in MoltenVK commit history seems to indicate that Metal does not expose total memory, instead exposes a "recommended maximum":

if (getHasUnifiedMemory()) { return mvkGetSystemMemorySize(); } // There's actually no way to query the total physical VRAM on the device in Metal. // Just default to using the recommended max working set size (i.e. the budget). return getRecommendedMaxWorkingSetSize();

recommendedWorkingSetSize in MTLDevice is described as:

An approximation of how much memory, in bytes, this device can use with good performance. Performance may be improved by keeping the total size of all resources and heaps associated with this device object less than this threshold. Going above the threshold may incur a performance penalty.

Ravbug commented 2 years ago

This code I adapted from Maya seems to work for macOS for getting the total VRAM:

void queryVRAMandModelMac(uint64_t& vram, std::string& manufacturer, std::string& model)
{
    vram = 0;
    CGError res = CGDisplayNoErr;
    // query active displays
    CGDisplayCount dspCount = 0;
    res = CGGetActiveDisplayList(0, NULL, &dspCount);
    if (res || dspCount == 0) {
        return;
    }
    // use boost here
    CGDirectDisplayID* displays = (CGDirectDisplayID*)calloc((size_t)dspCount, sizeof(CGDirectDisplayID));
    res = CGGetActiveDisplayList(dspCount, displays, &dspCount);
    if (res || dspCount == 0) {
        return;
    }
    SInt64 maxVramTotal = 0;
    for (int i = 0; i < dspCount; i++) {
        // get the service port for the display
        io_service_t dspPort = CGDisplayIOServicePort(displays[i]);
        // ask IOKit for the VRAM size property
        /* HD 2600: IOFBMemorySize = 256MB. VRAM,totalsize = 256MB
         HD 5770: IOFBMemorySize = 512MB. VRAM,totalsize = 1024MB
         Apple's QA page is not correct. We should search for IOPCIDevice's VRAM,totalsize property.
         CFTypeRef typeCode = IORegistryEntryCreateCFProperty(dspPort,
         CFSTR(kIOFBMemorySizeKey),
         kCFAllocatorDefault,
         kNilOptions);
         */
        SInt64 vramScale = 1;
        CFTypeRef typeCode = IORegistryEntrySearchCFProperty(dspPort,
                                                             kIOServicePlane,
                                                             CFSTR("VRAM,totalsize"),
                                                             kCFAllocatorDefault,
                                                             kIORegistryIterateRecursively | kIORegistryIterateParents);
        if (!typeCode) {
            // On the new Mac Pro, we have VRAM,totalMB instead.
            typeCode = IORegistryEntrySearchCFProperty(dspPort,
                                                       kIOServicePlane,
                                                       CFSTR("VRAM,totalMB"),
                                                       kCFAllocatorDefault,
                                                       kIORegistryIterateRecursively | kIORegistryIterateParents);
            if (typeCode) {
                vramScale = 1024 * 1024;
            }
        }
        // ensure we have valid data from IOKit
        if (typeCode) {
            SInt64 vramTotal = 0;
            if (CFGetTypeID(typeCode) == CFNumberGetTypeID()) {
                // AMD, VRAM,totalsize is CFNumber
                CFNumberGetValue((const __CFNumber*)typeCode, kCFNumberSInt64Type, &vramTotal);
            }
            else if (CFGetTypeID(typeCode) == CFDataGetTypeID()) {
                // NVIDIA, VRAM,totalsize is CFData
                CFIndex      length = CFDataGetLength((const __CFData*)typeCode);
                const UInt8* data   = CFDataGetBytePtr((const __CFData*)typeCode);
                if (length == 4) {
                    vramTotal = *(const unsigned int*)data;
                }
                else if (length == 8) {
                    vramTotal = *(const SInt64*)data;
                }
            }
            vramTotal *= vramScale;
            CFRelease(typeCode);

            if (vramTotal > maxVramTotal) {
                maxVramTotal = vramTotal;
                typeCode = IORegistryEntrySearchCFProperty(dspPort,
                                                           kIOServicePlane,
                                                           CFSTR("NVDA,Features"),
                                                           kCFAllocatorDefault,
                                                           kIORegistryIterateRecursively | kIORegistryIterateParents);
                if (typeCode) {
                    manufacturer = "NVIDIA";
                    CFRelease(typeCode);
                }
                typeCode = IORegistryEntrySearchCFProperty(dspPort,
                                                           kIOServicePlane,
                                                           CFSTR("ATY,Copyright"),
                                                           kCFAllocatorDefault,
                                                           kIORegistryIterateRecursively | kIORegistryIterateParents);
                if (typeCode) {
                    manufacturer = "Advanced Micro Devices, Inc.";
                    CFRelease(typeCode);
                }
                // GPU model
                typeCode = IORegistryEntrySearchCFProperty(dspPort,
                                                           kIOServicePlane,
                                                           CFSTR("model"),
                                                           kCFAllocatorDefault,
                                                           kIORegistryIterateRecursively | kIORegistryIterateParents);
                if (typeCode) {
                    if (CFGetTypeID(typeCode) == CFDataGetTypeID()) {
                        model = (const char*)CFDataGetBytePtr((const __CFData*)typeCode);
                    }
                    CFRelease(typeCode);
                }
            }
        }
    }
    vram = maxVramTotal;
}

When run on my iMac I get the following, which is correct:

std::string manufacturer, model;
uint64_t vram;
queryVRAMandModelMac(vram, manufacturer,model);

// manufacturer = Advanced Micro Devices, Inc
// model = AMD Radeon R9 M395X
// vram = 4294967296 (which is 4096 MB)

Don't know what this will do on an Apple Silicon device, I don't have one so I can't test it.

bkaradzic commented 2 years ago

Don't know what this will do on an Apple Silicon device, I don't have one so I can't test it.

Cool, thanks for research!

I can test this.