dougallj / applegpu

Apple G13 GPU architecture docs and tools
BSD 3-Clause "New" or "Revised" License
552 stars 41 forks source link

New hwtestbed #59

Closed TellowKrinkle closed 7 months ago

TellowKrinkle commented 7 months ago

Adds a new hwtestbed based around patching metal binary archives with custom shader code. This avoids the need to update it for every macOS update. The only time it would need changes is if Apple changes the how the compiler assigns buffers to uniform registers or changes the format of binary archives.

It's also pipe-based (aside from supplying the actual file due to how the Metal api is laid out), and pretty fast (I've never run the original due to macOS version mismatches, so I don't know how fast it was, but the new one completes tests in 18s with ~90% cpu time in python), so I've removed the caching stuff

Also adds a tool to test random shader code on the command line:

> python3 hwtestbed.py "get_sr r1.cache, sr52; simd_shuffle r1, r1, r0l" -r{0,1,2,0,4,4,8,8}{,,,}
Thread  0: r1: 00000003 (3         ), input: 0
Thread  1: r1: 00000003 (3         ), input: 1
Thread  2: r1: 00000003 (3         ), input: 2
Thread  3: r1: 00000003 (3         ), input: 0
Thread  4: r1: 00000004 (4         ), input: 4
Thread  5: r1: 00000004 (4         ), input: 4
Thread  6: r1: 0000000b (11        ), input: 8
Thread  7: r1: 0000000b (11        ), input: 8
Thread  8: r1: 00000003 (3         ), input: 0
Thread  9: r1: 00000003 (3         ), input: 1
Thread 10: r1: 00000003 (3         ), input: 2
Thread 11: r1: 00000003 (3         ), input: 0
Thread 12: r1: 00000004 (4         ), input: 4
Thread 13: r1: 00000004 (4         ), input: 4
Thread 14: r1: 0000000b (11        ), input: 8
Thread 15: r1: 0000000b (11        ), input: 8
Thread 16: r1: 00000003 (3         ), input: 0
Thread 17: r1: 00000003 (3         ), input: 1
Thread 18: r1: 00000003 (3         ), input: 2
Thread 19: r1: 00000003 (3         ), input: 0
Thread 20: r1: 00000004 (4         ), input: 4
Thread 21: r1: 00000004 (4         ), input: 4
Thread 22: r1: 0000000b (11        ), input: 8
Thread 23: r1: 0000000b (11        ), input: 8
Thread 24: r1: 00000003 (3         ), input: 0
Thread 25: r1: 00000003 (3         ), input: 1
Thread 26: r1: 00000003 (3         ), input: 2
Thread 27: r1: 00000003 (3         ), input: 0
Thread 28: r1: 00000004 (4         ), input: 4
Thread 29: r1: 00000004 (4         ), input: 4
Thread 30: r1: 0000000b (11        ), input: 8
Thread 31: r1: 0000000b (11        ), input: 8

(And yes it seems there is a reason Apple runs four separate simd_shuffles with differing bits on the bottom when you use metal::simd_shuffle)

dougallj commented 7 months ago

Amazing! tbh I don't remember how long it took either, but I'd guess that's at least a 60x speed up. And, well, it should actually work, so that's nice too :)