Open tylerjereddy opened 1 year ago
I ran the following Pytorch example inside a Tart VM and indeed it seems not supported by underlying Virtualization.framework
. Seems it's not supported yet but hopefully there will be some news on WWDC in two weeks. 🤞
Thanks, this would be pretty cool!
With a little bit more investigation it seems the Virtualization.Framework
should support Metal. It's mentioned in the last years WWDC video on 10:53
. There is even ParavirtualizedGraphics.Framework
that predates Virtualization.Framewerk
which allegedly should use it.
But in my testing I don't see any graphics devices inside the VM:
Comparing to what I see on an M1 Mac Mini:
@edigaryev I know you diged into private APIs of Virtualization.Framework
. Have you seem maybe any mentions of Metal?
@fkorotkov the paravirtualization actually seems to be used:
You can also check this by running ioreg
inside of a VM:
% ioreg -n AppleParavirtGPU -r
+-o AppleParavirtGPU <class AppleParavirtGPU, id 0x100000191, registered, matched, active, busy 0 (1 ms), retain 13>
| {
| "IOClass" = "AppleParavirtGPU"
| "KDebugVersion" = 4294967296
| "IOPersonalityPublisher" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
| "IOMatchedAtBoot" = Yes
| "IOReportLegendPublic" = Yes
| "AGCInfo" = {"fLastSubmissionPID"=134,"fSubmissionsSinceLastCheck"=0,"fBusyCount"=0}
| "IOProviderClass" = "AppleARMIODevice"
| "MetalPluginName" = "AppleParavirtGPUMetalIOGPUFamily"
| "IOProbeScore" = 0
| "SurfaceList" = ()
| "IONameMatch" = "paravirtualizedgraphics,gpu"
| "MetalPluginClassName" = "AppleParavirtDevice"
| "SchedulerState" = {"Stamps"=(),"BusyWorkQueues"=()}
| "CFBundleIdentifierKernel" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
| "IOMatchCategory" = "IOAcceleratorES"
| "CFBundleIdentifier" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
| "IONameMatched" = "paravirtualizedgraphics,gpu"
| "PerformanceStatistics" = {"recoveryCount"=0,"In use system memory"=108962304,"Alloc system memory"=52527104}
| "IOGeneralInterest" = "IOCommand is not serializable"
| "IOReportLegend" = ({"IOReportChannels"=((1,6442450945,"Alloc system memory"),(2,6442450945,"In use system memory"),(3,6442450945,"GPU Restart Count")),"IOReportGroupName"="Internal Statistics","IOReportChan$
| "DisplayPortCount" = 1
| }
|
+-o AppleParavirtDisplay <class AppleParavirtDisplay, id 0x1000001df, registered, matched, active, busy 0 (0 ms), retain 9>
| +-o IOMobileFramebufferUserClient <class IOMobileFramebufferUserClient, id 0x100000285, !registered, !matched, active, busy 0, retain 5>
| +-o IOMobileFramebufferUserClient <class IOMobileFramebufferUserClient, id 0x100000286, !registered, !matched, active, busy 0, retain 5>
+-o AppleParavirtDeviceUserClient <class AppleParavirtDeviceUserClient, id 0x100000294, !registered, !matched, active, busy 0, retain 5>
+-o AppleParavirtDeviceUserClient <class AppleParavirtDeviceUserClient, id 0x100000353, !registered, !matched, active, busy 0, retain 5>
+-o AppleParavirtDeviceUserClient <class AppleParavirtDeviceUserClient, id 0x10000035a, !registered, !matched, active, busy 0, retain 5>
+-o AppleParavirtDeviceUserClient <class AppleParavirtDeviceUserClient, id 0x10000035d, !registered, !matched, active, busy 0, retain 5>
+-o AppleParavirtDeviceUserClient <class AppleParavirtDeviceUserClient, id 0x10000036a, !registered, !matched, active, busy 0, retain 5>
+-o AppleParavirtDeviceUserClient <class AppleParavirtDeviceUserClient, id 0x1000003fa, !registered, !matched, active, busy 0, retain 5>
I'm not sure as to why Apple’s Metal Performance Shaders don't work, though.
Perhaps @Developer-Ecosystem-Engineering might be able to (informally) point us in the right direction? I know they've been quite helpful with NumPy low-level development on M-series chips.
I am running into the same issue as well.
Its currently not supported to run these types of workloads under virtualization.framework.
We understand the request!
Since Cirrus CI offers some native arm Mac (M chip) services, I was wondering if there might be some documentation/examples/options for using the GPU component (i.e., the metal performance shaders) when testing with i.e.,
torch
which has anmps
backend: https://pytorch.org/docs/stable/notes/mps.htmlI did a little experiment here: https://github.com/tylerjereddy/scipy/pull/71
And found that there may be some restrictions that prevent practical usage in the open source tier:
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
Do you have any experience/guidance here? Is this expected? Is this disabled and you don't want us trying it? It would be very cool to be able to flush through GPUs in CI like that!