New hwtestbed - Githubissues

Adds a new hwtestbed based around patching metal binary archives with custom shader code. This avoids the need to update it for every macOS update. The only time it would need changes is if Apple changes the how the compiler assigns buffers to uniform registers or changes the format of binary archives.

It's also pipe-based (aside from supplying the actual file due to how the Metal api is laid out), and pretty fast (I've never run the original due to macOS version mismatches, so I don't know how fast it was, but the new one completes tests in 18s with ~90% cpu time in python), so I've removed the caching stuff

Also adds a tool to test random shader code on the command line:

> python3 hwtestbed.py "get_sr r1.cache, sr52; simd_shuffle r1, r1, r0l" -r{0,1,2,0,4,4,8,8}{,,,}
Thread  0: r1: 00000003 (3         ), input: 0
Thread  1: r1: 00000003 (3         ), input: 1
Thread  2: r1: 00000003 (3         ), input: 2
Thread  3: r1: 00000003 (3         ), input: 0
Thread  4: r1: 00000004 (4         ), input: 4
Thread  5: r1: 00000004 (4         ), input: 4
Thread  6: r1: 0000000b (11        ), input: 8
Thread  7: r1: 0000000b (11        ), input: 8
Thread  8: r1: 00000003 (3         ), input: 0
Thread  9: r1: 00000003 (3         ), input: 1
Thread 10: r1: 00000003 (3         ), input: 2
Thread 11: r1: 00000003 (3         ), input: 0
Thread 12: r1: 00000004 (4         ), input: 4
Thread 13: r1: 00000004 (4         ), input: 4
Thread 14: r1: 0000000b (11        ), input: 8
Thread 15: r1: 0000000b (11        ), input: 8
Thread 16: r1: 00000003 (3         ), input: 0
Thread 17: r1: 00000003 (3         ), input: 1
Thread 18: r1: 00000003 (3         ), input: 2
Thread 19: r1: 00000003 (3         ), input: 0
Thread 20: r1: 00000004 (4         ), input: 4
Thread 21: r1: 00000004 (4         ), input: 4
Thread 22: r1: 0000000b (11        ), input: 8
Thread 23: r1: 0000000b (11        ), input: 8
Thread 24: r1: 00000003 (3         ), input: 0
Thread 25: r1: 00000003 (3         ), input: 1
Thread 26: r1: 00000003 (3         ), input: 2
Thread 27: r1: 00000003 (3         ), input: 0
Thread 28: r1: 00000004 (4         ), input: 4
Thread 29: r1: 00000004 (4         ), input: 4
Thread 30: r1: 0000000b (11        ), input: 8
Thread 31: r1: 0000000b (11        ), input: 8

(And yes it seems there is a reason Apple runs four separate simd_shuffles with differing bits on the bottom when you use metal::simd_shuffle)

dougallj / applegpu

New hwtestbed #59