dougallj / applegpu

Apple G13 GPU architecture docs and tools
BSD 3-Clause "New" or "Revised" License
538 stars 38 forks source link

Add optional metal binary archive/lib parsing to disassembler #19

Open dougallj opened 1 year ago

dougallj commented 1 year ago

I have a terrible mockup of this that I've been using on compute shaders. You shouldn't trust this, but being able to find the code and print names is worthwhile.

(Pre-compilation is now supported, so that's nice: https://developer.apple.com/videos/play/wwdc2022/10102/)

Terrible mockup:

import re
import subprocess

with open(sys.argv[1], 'rb') as inf:
    data = inf.read()

for i in list(re.finditer(b'\xcf\xfa\xed\xfe', data)):
    current = data[i.start():]
    with open('tmp.bin', 'wb') as outf:
        outf.write(current)

    code = None
    for i in re.finditer('Section\n  sectname (.*)\n   segname (.*)\n      addr (.*)\n      size (.*)\n    offset (.*)',
                         subprocess.check_output('otool -l tmp.bin', shell=True).decode('ascii')):
        sectname, segname, addr, size, offset = i.groups()
        addr = int(addr, 16)
        size = int(size, 16)
        offset = int(offset)
        if segname == '__GPU_LD_MD':
            print('-' * 80)
            with open('tmp1.bin', 'wb') as f:
                f.write(current[offset:offset+size])
            print(subprocess.check_output('strings - tmp1.bin', shell=True).decode('ascii').strip()) # prints name (and other junk)
        if sectname == '__text':
            code = current[offset:offset+size]

    if code:
        if code.startswith(b'\xcf\xfa\xed\xfe'):
            continue
        if code.startswith(b'MTLPSBIN'):
            continue
        for line in subprocess.check_output(['nm', 'tmp.bin']).decode('ascii').strip().split('\n'):
            a, b, c = line.split()
            if 'ltmp' not in c:
                print()
                print(c + ':')
                disassemble(code[int(a,16):]) # relies on STOP_ON_STOP
peterwmwong commented 1 year ago

I've gotten some good results using metal-lipo and metal-nm to extract file offsets that can be plugged into disassemble.py.

#!/bin/sh

HARVESTED_METALLIB_FILE=$1

# The harvested metallib contains both AIR/LLVM and native architecture (ex. "applegpu_g13s") representations.
# List and grab the first architecture that looks native (starts with "applegpu_")
ARCH=$(xcrun metal-lipo -archs "$HARVESTED_METALLIB_FILE" | tr ' ' '\n' | grep "applegpu_" | head -n1)
TMP_ARCH_METALLIB_FILE="/tmp/$ARCH.metallib"

# By extracting the native architecture metallib, metal-nm seems to give accurate file offsets to 
# each shader function.
xcrun metal-lipo -thin "$ARCH" "$HARVESTED_METALLIB_FILE" -o "$TMP_ARCH_METALLIB_FILE"

IFS=$'\n'
for i in $(xcrun metal-nm "$TMP_ARCH_METALLIB_FILE" | grep "_agc.main"); do
  addr=$(echo "$i" | cut -d' ' -f1)
  fn_name=$(echo "$i" | cut -d' ' -f3)
  echo "\nFunction: $fn_name"
  echo "================================================================="
  python3 disassemble.py "$TMP_ARCH_METALLIB_FILE" "0x$addr"
done
Example output on a simple tiled deferred render pipeline ```txt Function: __gbuf_fragment_agc.main ================================================================= 0: 4800c200 writeout 512, 3 4: 61014b80 TODO.iter r0_r1, perspective, cf11, cf0, 0, forward, pixel, no_centroid, 0b0 8: 319800008062070107000000 texture_sample 0, 0b00, 0b01100, 0b0, 0b00000, xyz, 0b000, r6l_r6h_r7l, None, 0, ss1, tex_2d, r0_r1, auto_lod, 0 14: 319000008022078105000000 texture_sample 0, 0b00, 0b00100, 0b1, 0b00000, xyz, 0b000, r4l_r4h_r5l, None, 0, ss1, tex_2d, r0_r1, auto_lod, 0 20: 319640008062018106000000 texture_sample 0, 0b00, 0b01100, 0b1, 0b00000, x, 0b000, r5h, None, 0, ss1, tex_2d, r0_r1.discard, auto_lod, 0 2c: 6189034000400000 TODO.iter r2, perspective, cf3, cf0, 0, no_forward, pixel, no_centroid, 0b0 34: 61014180 TODO.iter r0_r1, perspective, cf1, cf0, 0, forward, pixel, no_centroid, 0b0 38: 318ec0008062018024004400 texture_sample 1, 0b00, 0b01100, 0b1, 0b00001, x, 0b001, r3h, None, 0, ss0, tex_2d, r0_r1.discard, auto_lod, 0, r2 44: 618808c000400000 TODO.iter r2l_r2h_r3l, perspective, cf8, cf0, 0, no_forward, pixel, no_centroid, 0b0 4c: 61000400 TODO.iter r0l_r0h_r1l_r1h, perspective, cf4, cf0, 0, forward, pixel, no_centroid, 0b0 50: 3800 wait 0 52: 96a084204800 fmul16 r8l.cache, r2l.cache, r1l.cache.neg 58: 969e85004800 fmul16 r7h.cache, r2h.cache, r0l.cache.neg 5e: b622851008d0 fmadd16 r8h.cache, r2h.cache, r0h.cache, r8l.discard 64: b61e832008cf fmadd16 r7h.cache, r1h.cache, r1l.cache, r7h.discard 6a: 96a483104800 fmul16 r9l.cache, r1h.cache, r0h.cache.neg 70: b6a0cc0000308001 fmadd16 r8l.cache, r6l.discard, 2.0, -1.0 78: b618840008d2 fmadd16 r6l.cache, r2l.cache, r0l.cache, r9l.discard 7e: 968690300c00 fmul16 r1h.cache, r8l.cache, r1h.discard 84: b6a4cd0000308001 fmadd16 r9l.cache, r6h.discard, 2.0, -1.0 8c: b69ace0000308001 fmadd16 r6h.cache, r7l.discard, 2.0, -1.0 94: 968cc6240d00 fmul16 r3l.cache, r3l.discard.neg, r9l.discard 9a: b6008d000cc3 fmadd16 r0l.cache, r6h.cache, r0l.discard, r1h.discard a0: 968690400c00 fmul16 r1h.cache, r8l.cache, r2l.discard a6: 3600d16008c0 fmadd16 r0l, r8h.discard, r3l.cache, r0l.discard ac: b6068d100cc3 fmadd16 r1h.cache, r6h.cache, r0h.discard, r1h.discard b2: 9682d0500c00 fmul16 r0h.cache, r8l.discard, r2h.discard b8: b606cf6008c3 fmadd16 r1h.cache, r7h.discard, r3l.cache, r1h.discard be: b604cd200cc1 fmadd16 r1l.cache, r6h.discard, r1l.discard, r0h.discard c4: 168243300400 fmul16 r0h, r1h, r1h ca: 360acc600cc2 fmadd16 r2h, r6l.discard, r3l.discard, r1l.discard d0: 3801 wait 1 d2: b602800008c1 fmadd16 r0h.cache, r0l.cache, r0l.cache, r0h.discard d8: b602855008c1 fmadd16 r0h.cache, r2h.cache, r2h.cache, r0h.discard de: e21900000000 mov_imm r6.cache, 0 e4: 8a08c190 rsqrt r2l.cache, r0h.discard e8: 7e1c4c088000 mov r7l, r6l ee: 7e1e4d088000 mov r7h, r6h f4: 1a8184000c00 fmul32 r0, r2l.cache, r0l.discard fa: 1a8584300c00 fmul32 r1, r2l.cache, r1h.discard 100: 1a89c4500c00 fmul32 r2, r2l.discard, r2h.discard 106: 2a8dc7000002 fadd32 r3, r3h.discard, -0.0 10c: 61a10d4000400000 TODO.iter r8, perspective, cf13, cf0, 0, no_forward, pixel, no_centroid, 0b0 114: 48300000 writeout 48, 0 118: 09010087f0fc8003 st_tile r0_r1_r2_r3, s16norm, 0, 0, quad, 8, 255, 0, 0 120: 0910004af0fc8003 st_tile r4l_r4h_r5l_r5h, srgba8, 0, 0, quad, 4, 255, 0, 0 128: 0921000611fc0003 st_tile r8, u16norm, 1, 0, single, 0, 255, 0, 0 130: 5830 132: 0000 134: 480c0000 writeout 12, 0 138: 0918000af0fc8003 st_tile r6l_r6h_r7l_r7h, srgba8, 0, 0, quad, 0, 255, 0, 0 140: 8800 stop Function: __gbuf_fragment_agc.main.constant_program ================================================================= 0: 0501140d00c4f200 device_load 0, i32, quad, r0_r1_r2_r3, u2_u3, 1, signed, lsl 1 8: 0521340d00c43200 device_load 0, i32, pair, r4_r5, u2_u3, 3, signed, lsl 1 10: 3800 wait 0 12: 8e2580c118000000 iadd r9.cache, u0, u6 1a: 8e1d80012c000000 iadd r7.cache, u0, r0.discard 22: 9202520218010130 icmpsel ult, r0h.cache, r9, u0, 1, 0 2a: 92004e0218010130 icmpsel ult, r0l.cache, r7, u0, 1, 0 32: 0e21c12018000000 iadd r8, r0h.discard, u1 3a: 0e19c02018000000 iadd r6, r0l.discard, u1 42: 8e0d80812c000000 iadd r3.cache, u0, r4.discard 4a: 8e0580412c000000 iadd r1.cache, u0, r2.discard 52: 9202460218010130 icmpsel ult, r0h.cache, r3, u0, 1, 0 5a: 9200420218010130 icmpsel ult, r0l.cache, r1, u0, 1, 0 62: 0e09c12018000000 iadd r2, r0h.discard, u1 6a: 0e01c02018000000 iadd r0, r0l.discard, u1 72: c548003d01803000 uniform_store 2, i16, pair, 0, r9l_r9h, 16 7a: c540203d01803000 uniform_store 2, i16, pair, 0, r8l_r8h, 18 82: c538403d01803000 uniform_store 2, i16, pair, 0, r7l_r7h, 20 8a: c530603d01803000 uniform_store 2, i16, pair, 0, r6l_r6h, 22 92: c518803d01803000 uniform_store 2, i16, pair, 0, r3l_r3h, 24 9a: c510a03d01803000 uniform_store 2, i16, pair, 0, r2l_r2h, 26 a2: c508c03d01803000 uniform_store 2, i16, pair, 0, r1l_r1h, 28 aa: c500e03d01803000 uniform_store 2, i16, pair, 0, r0l_r0h, 30 b2: 8800 stop Function: __gbuf_vertex_agc.main ================================================================= 0: 0549a60e50c01200 device_load 0, i32, single, r9, u43_u44, r5, unsigned 8: 0501a24e60c8f200 device_load 1, i32, quad, r0_r1_r2_r3, u49_u50, r5, unsigned, lsl 2 10: 3800 wait 0 12: 9e1752c200000000 imadd r5_r6.cache, r9, 12, 0 1a: 8e219cc128040000 iadd r8.cache, u46, r6.cache 22: 8e19a0c12c040000 iadd r6.cache, u48, r6.discard 2a: 8e119aa128040000 iadd r4.cache, u45, r5.cache 32: 8e1d9ea12c040000 iadd r7.cache, u47, r5.discard 3a: 929448a2190101300001 icmpsel ult, r5l.cache, r4, u45, 1, 0 44: 92a84ee2190101300001 icmpsel ult, r10l.cache, r7, u47, 1, 0 4e: 0e15ca002d000000 iadd r5, r5l.discard, r8.discard 56: 0e21d4c02c000000 iadd r8, r10l.discard, r6.discard 5e: 0521080500c87200 device_load 0, i32, triple, r4_r5_r6, r4_r5, 0, signed, lsl 2 66: 0551260e61c43200 device_load 0, i32, pair, r10_r11, u51_u52, r9, unsigned, lsl 1 6e: 05390e0500c87200 device_load 0, i32, triple, r7_r8_r9, r7_r8, 0, signed, lsl 2 76: 3801 wait 1 78: aab0c0020002 fadd32 r12l.cache, r0.discard, -0.0 7e: aa82c2020002 fadd32 r0h.cache, r1.discard, -0.0 84: 96842d810900 fmul16 r1l.cache, u22h, r12l.cache 8a: 96802e810900 fmul16 r0l.cache, u23l, r12l.cache 90: 96862c810d00 fmul16 r1h.cache, u22l, r12l.discard 96: aa88c4020002 fadd32 r2l.cache, r2.discard, -0.0 9c: b6062f1108c3 fmadd16 r1h.cache, u23h, r0h.cache, r1h.discard a2: b604301108c2 fmadd16 r1l.cache, u24l, r0h.cache, r1l.discard a8: b606324108c3 fmadd16 r1h.cache, u25l, r2l.cache, r1h.discard ae: b604334108c2 fmadd16 r1l.cache, u25h, r2l.cache, r1l.discard b4: b60231110cc0 fmadd16 r0h.cache, u24h, r0h.discard, r0l.discard ba: 968082200800 fmul16 r0l.cache, r1l.cache, r1l.cache c0: b60234410cc1 fmadd16 r0h.cache, u26l, r2l.discard, r0h.discard c6: b600833008c0 fmadd16 r0l.cache, r1h.cache, r1h.cache, r0l.discard cc: b600411004c0 fmadd16 r0l.cache, r0h, r0h, r0l.discard d2: 8a00c090 rsqrt r0l.cache, r0l.discard d6: 16b480300c00 fmul16 r13l, r0l.cache, r1h.discard dc: 16b640200c00 fmul16 r13h, r0l, r1l.discard e2: 3800 wait 0 e4: 16b0c0100c00 fmul16 r12l, r0l.discard, r0h.discard ea: ba85b681288e4100 fmadd32 r1.cache, u27, r4.cache, u39 f2: ba81b88124904100 fmadd32 r0.cache, u28, r4, u40 fa: ba85bea128c20200 fmadd32 r1.cache, u31, r5.cache, r1.discard 102: ba8180a124c00204 fmadd32 r0.cache, u32, r5, r0.discard 10a: 3a8586c128c20204 fmadd32 r1, u35, r6.cache, r1.discard 112: 3a8188c124c00204 fmadd32 r0, u36, r6, r0.discard 11a: 2ab2c6020002 fadd32 r12h, r3.discard, -0.0 120: 2a88d4020002 fadd32 r2l, r10.discard, -0.0 126: 2a8ad6020002 fadd32 r2h, r11.discard, -0.0 12c: 11348780 st_var 1, r13, 7 130: 11308880 st_var 1, r12, 8 134: 11088b80 st_var 1, r2, 11 138: 11048080 st_var 1, r1, 0 13c: 11008180 st_var 1, r0, 1 140: ba85bc8128944100 fmadd32 r1.cache, u30, r4.cache, u42 148: ba81ba8128924100 fmadd32 r0.cache, u29, r4.cache, u41 150: ba8584a128c20204 fmadd32 r1.cache, u34, r5.cache, r1.discard 158: ba8982a128c00204 fmadd32 r2.cache, u33, r5.cache, r0.discard 160: ba81928128aa0100 fmadd32 r0.cache, u9, r4.cache, u21 168: 3aa98ac128c40204 fmadd32 r10, u37, r6.cache, r2.discard 170: ba819aa128c00200 fmadd32 r0.cache, u13, r5.cache, r0.discard 178: 3a8d8cc128c20204 fmadd32 r3, u38, r6.cache, r1.discard 180: ba81a2c128c00200 fmadd32 r0.cache, u17, r6.cache, r0.discard 188: ba858c8128a40100 fmadd32 r1.cache, u6, r4.cache, u18 190: 8a01c082 rcp r0.cache, r0.discard 194: ba89908124a80100 fmadd32 r2.cache, u8, r4, u20 19c: ba8594a128c20200 fmadd32 r1.cache, u10, r5.cache, r1.discard 1a4: ba8998a124c40200 fmadd32 r2.cache, u12, r5, r2.discard 1ac: ba859cc128c20200 fmadd32 r1.cache, u14, r6.cache, r1.discard 1b4: ba89a0c124c40200 fmadd32 r2.cache, u16, r6, r2.discard 1bc: 1a85c2022800 fmul32 r1, r1.discard, r0.cache 1c2: 1a89c4022400 fmul32 r2, r2.discard, r0 1c8: 11288280 st_var 1, r10, 2 1cc: 110c8380 st_var 1, r3, 3 1d0: 11088480 st_var 1, r2, 4 1d4: 11048580 st_var 1, r1, 5 1d8: aa88ce020002 fadd32 r2l.cache, r7.discard, -0.0 1de: aa8cd0020002 fadd32 r3l.cache, r8.discard, -0.0 1e4: 96862c410800 fmul16 r1h.cache, u22l, r2l.cache 1ea: 96842d410800 fmul16 r1l.cache, u22h, r2l.cache 1f0: 96882e410c00 fmul16 r2l.cache, u23l, r2l.discard 1f6: aa8ad2020002 fadd32 r2h.cache, r9.discard, -0.0 1fc: b608316108c4 fmadd16 r2l.cache, u24h, r3l.cache, r2l.discard 202: b6062f6108c3 fmadd16 r1h.cache, u23h, r3l.cache, r1h.discard 208: b60430610cc2 fmadd16 r1l.cache, u24l, r3l.discard, r1l.discard 20e: b61e325108c3 fmadd16 r7h.cache, u25l, r2h.cache, r1h.discard 214: b61c335108c2 fmadd16 r7l.cache, u25h, r2h.cache, r1l.discard 21a: ba862e8128350100 fmadd32 r1h.cache, u23l, r4.cache, u26h 222: 96848ee00800 fmul16 r1l.cache, r7l.cache, r7l.cache 228: b60e34510cc4 fmadd16 r3h.cache, u26l, r2h.discard, r2l.discard 22e: b6048ff008c2 fmadd16 r1l.cache, r7h.cache, r7h.cache, r1l.discard 234: ba2031a128c3 fmadd32 r8l.cache, u24h, r5.cache, r1h.discard 23a: b608877008c2 fmadd16 r2l.cache, r3h.cache, r3h.cache, r1l.discard 240: ba858e812ca60100 fmadd32 r1.cache, u7, r4.discard, u19 248: 8a0cc490 rsqrt r3l.cache, r2l.discard 24c: ba8596a12cc20200 fmadd32 r1.cache, u11, r5.discard, r1.discard 254: ba899ec128c20200 fmadd32 r2.cache, u15, r6.cache, r1.discard 25c: 3a0434c12cd0 fmadd32 r1l, u26l, r6.discard, r8l.discard 262: 1a89c4022c00 fmul32 r2, r2.discard, r0.discard 268: 168686f00c00 fmul16 r1h, r3l.cache, r7h.discard 26e: 168086e00c00 fmul16 r0l, r3l.cache, r7l.discard 274: 1682c6700c00 fmul16 r0h, r3l.discard, r3h.discard 27a: 11088680 st_var 1, r2, 6 27e: 11048980 st_var 1, r1, 9 282: 91008a80 st_var_final 1, r0, 10 286: 8800 stop Function: __gbuf_vertex_agc.main.constant_program ================================================================= 0: 0569080d00c8f200 device_load 0, i32, quad, r13_r14_r15_r16, u4_u5, 0, signed, lsl 2 8: 0591180d00c8f200 device_load 0, i32, quad, r18_r19_r20_r21, u4_u5, 1, signed, lsl 2 10: 0549280d00c8f200 device_load 0, i32, quad, r9_r10_r11_r12, u4_u5, 2, signed, lsl 2 18: 0501380d00c8f200 device_load 0, i32, quad, r0_r1_r2_r3, u4_u5, 3, signed, lsl 2 20: 05f1440d00c8f200 device_load 0, i32, quad, r30_r31_r32_r33, u2_u3, 4, signed, lsl 2 28: 05d1540d00c8f200 device_load 0, i32, quad, r26_r27_r28_r29, u2_u3, 5, signed, lsl 2 30: 05b1640d00c8f200 device_load 0, i32, quad, r22_r23_r24_r25, u2_u3, 6, signed, lsl 2 38: 0521740d00c8f200 device_load 0, i32, quad, r4_r5_r6_r7, u2_u3, 7, signed, lsl 2 40: 0559844d00c97200 device_load 1, i32, triple, r43_r44_r45, u2_u3, 8, signed, lsl 2 48: 0541944d00c97200 device_load 1, i32, triple, r40_r41_r42, u2_u3, 9, signed, lsl 2 50: 0529a44d00c97200 device_load 1, i32, triple, r37_r38_r39, u2_u3, 10, signed, lsl 2 58: 0511b44d00c97200 device_load 1, i32, triple, r34_r35_r36, u2_u3, 11, signed, lsl 2 60: 3800 wait 0 62: 9ac5b6622a00 fmul32 r17.cache, r27.cache, r19.cache 68: 9aa1ae622600 fmul32 r8.cache, r23.cache, r19 6e: bac5b4c229e20200 fmadd32 r17.cache, r26.cache, r14.cache, r17.discard 76: baa1acc225d00200 fmadd32 r8.cache, r22.cache, r14, r8.discard 7e: bac5b84229e20200 fmadd32 r17.cache, r28.cache, r10.cache, r17.discard 86: baa1b04225d00200 fmadd32 r8.cache, r24.cache, r10, r8.discard 8e: 3ac97a2228e20210 fmadd32 r50, r29, r1.cache, r17.discard 96: 3abdb22224d00210 fmadd32 r47, r25.cache, r1, r8.discard 9e: 9ac576a22a00 fmul32 r17.cache, r27, r21.cache a4: 9aa1aea22600 fmul32 r8.cache, r23.cache, r21 aa: bac574022ae20200 fmadd32 r17.cache, r26, r16.cache, r17.discard b2: baa1ac0226d00200 fmadd32 r8.cache, r22.cache, r16, r8.discard ba: bac5b88229e20200 fmadd32 r17.cache, r28.cache, r12.cache, r17.discard c2: baa1b08225d00200 fmadd32 r8.cache, r24.cache, r12, r8.discard ca: 3ac5ba6228e20210 fmadd32 r49, r29.cache, r3.cache, r17.discard d2: 3ab9b26224d00210 fmadd32 r46, r25.cache, r3, r8.discard da: 9ac576422a00 fmul32 r17.cache, r27, r18.cache e0: 9aa16e422600 fmul32 r8.cache, r23, r18 e6: bac574a229e20200 fmadd32 r17.cache, r26, r13.cache, r17.discard ee: baa16ca229d00200 fmadd32 r8.cache, r22, r13.cache, r8.discard f6: bac5782229e20200 fmadd32 r17.cache, r28, r9.cache, r17.discard fe: baa1702229d00200 fmadd32 r8.cache, r24, r9.cache, r8.discard 106: 3acd7a0228e20210 fmadd32 r51, r29, r0.cache, r17.discard 10e: 3ac1720228d00210 fmadd32 r48, r25, r0.cache, r8.discard 116: 9ac58a422a00 fmul32 r17.cache, r5.cache, r18.cache 11c: 9aa14a622a00 fmul32 r8.cache, r5, r19.cache 122: bac588a229e20200 fmadd32 r17.cache, r4.cache, r13.cache, r17.discard 12a: baa148c229d00200 fmadd32 r8.cache, r4, r14.cache, r8.discard 132: bac58c2225e20200 fmadd32 r17.cache, r6.cache, r9, r17.discard 13a: baa14c4229d00200 fmadd32 r8.cache, r6, r10.cache, r8.discard 142: 3ac58e0228e20200 fmadd32 r17, r7.cache, r0.cache, r17.discard 14a: 3aa14e2228d00200 fmadd32 r8, r7, r1.cache, r8.discard 152: 9ab5dac22b00 fmul32 r13.cache, r13.discard, r30.cache 158: 9ab9dcc22b00 fmul32 r14.cache, r14.discard, r30.cache 15e: bab5e4e22bda0200 fmadd32 r13.cache, r18.discard, r31.cache, r13.discard 166: bab9e6e22bdc0200 fmadd32 r14.cache, r19.discard, r31.cache, r14.discard 16e: bab5d20228da0201 fmadd32 r13.cache, r9.discard, r32.cache, r13.discard 176: baa5d40228dc0201 fmadd32 r9.cache, r10.discard, r32.cache, r14.discard 17e: 3acdc02228da0201 fmadd32 r19, r0.discard, r33.cache, r13.discard 186: 3ac9c22228d20201 fmadd32 r18, r1.discard, r33.cache, r9.discard 18e: 9a85a0c22b00 fmul32 r1.cache, r16.cache, r30.cache 194: 9a819ec22f00 fmul32 r0.cache, r15.cache, r30.discard 19a: ba85aae22bc20200 fmadd32 r1.cache, r21.cache, r31.cache, r1.discard 1a2: ba81a8e22fc00200 fmadd32 r0.cache, r20.cache, r31.discard, r0.discard 1aa: ba85580228c20201 fmadd32 r1.cache, r12, r32.cache, r1.discard 1b2: ba8196022cc00201 fmadd32 r0.cache, r11.cache, r32.discard, r0.discard 1ba: 3ab5462228c20201 fmadd32 r13, r3, r33.cache, r1.discard 1c2: 3ab984222cc00201 fmadd32 r14, r2.cache, r33.discard, r0.discard 1ca: 9a85f6822a00 fmul32 r1.cache, r27.discard, r20.cache 1d0: 9a81ee822a00 fmul32 r0.cache, r23.discard, r20.cache 1d6: ba85f4e229c20200 fmadd32 r1.cache, r26.discard, r15.cache, r1.discard 1de: ba81ece229c00200 fmadd32 r0.cache, r22.discard, r15.cache, r0.discard 1e6: ba85f86229c20200 fmadd32 r1.cache, r28.discard, r11.cache, r1.discard 1ee: ba81f06229c00200 fmadd32 r0.cache, r24.discard, r11.cache, r0.discard 1f6: 3aa9fa4228c20200 fmadd32 r10, r29.discard, r2.cache, r1.discard 1fe: 3aa5f24228c00200 fmadd32 r9, r25.discard, r2.cache, r0.discard 206: 9a858a822e00 fmul32 r1.cache, r5.cache, r20.discard 20c: 9a81caa22e00 fmul32 r0.cache, r5.discard, r21.discard 212: ba8588e22dc20200 fmadd32 r1.cache, r4.cache, r15.discard, r1.discard 21a: ba81c8022ec00200 fmadd32 r0.cache, r4.discard, r16.discard, r0.discard 222: ba858c622dc20200 fmadd32 r1.cache, r6.cache, r11.discard, r1.discard 22a: ba81cc822dc00200 fmadd32 r0.cache, r6.discard, r12.discard, r0.discard 232: 3a998e422cc20200 fmadd32 r6, r7.cache, r2.discard, r1.discard 23a: 3a95ce622cc00200 fmadd32 r5, r7.discard, r3.discard, r0.discard 242: 3801 wait 1 244: 2a90d6020006 fadd32 r4l, r43.discard, -0.0 24a: 2a92d8020006 fadd32 r4h, r44.discard, -0.0 250: 2a8cda020006 fadd32 r3l, r45.discard, -0.0 256: 2a8ed0020006 fadd32 r3h, r40.discard, -0.0 25c: 2a88d2020006 fadd32 r2l, r41.discard, -0.0 262: 2a8ad4020006 fadd32 r2h, r42.discard, -0.0 268: 2a84ca020006 fadd32 r1l, r37.discard, -0.0 26e: 2a86cc020006 fadd32 r1h, r38.discard, -0.0 274: 2a80ce020006 fadd32 r0l, r39.discard, -0.0 27a: 2a82c8020006 fadd32 r0h, r36.discard, -0.0 280: c598c03d00803000 uniform_store 2, i16, pair, 0, r19l_r19h, 12 288: c590e03d00803000 uniform_store 2, i16, pair, 0, r18l_r18h, 14 290: c570003d01803000 uniform_store 2, i16, pair, 0, r14l_r14h, 16 298: c568203d01803000 uniform_store 2, i16, pair, 0, r13l_r13h, 18 2a0: c598403d01813000 uniform_store 2, i16, pair, 0, r51l_r51h, 20 2a8: c590603d01813000 uniform_store 2, i16, pair, 0, r50l_r50h, 22 2b0: c550803d01803000 uniform_store 2, i16, pair, 0, r10l_r10h, 24 2b8: c588a03d01813000 uniform_store 2, i16, pair, 0, r49l_r49h, 26 2c0: c580c03d01813000 uniform_store 2, i16, pair, 0, r48l_r48h, 28 2c8: c578e03d01813000 uniform_store 2, i16, pair, 0, r47l_r47h, 30 2d0: c548003d02803000 uniform_store 2, i16, pair, 0, r9l_r9h, 32 2d8: c570203d02813000 uniform_store 2, i16, pair, 0, r46l_r46h, 34 2e0: c588403d02803000 uniform_store 2, i16, pair, 0, r17l_r17h, 36 2e8: c540603d02803000 uniform_store 2, i16, pair, 0, r8l_r8h, 38 2f0: c530803d02803000 uniform_store 2, i16, pair, 0, r6l_r6h, 40 2f8: c528a03d02803000 uniform_store 2, i16, pair, 0, r5l_r5h, 42 300: c520c03d02803000 uniform_store 2, i16, pair, 0, r4l_r4h, 44 308: c518e03d02803000 uniform_store 2, i16, pair, 0, r3l_r3h, 46 310: c510003d03803000 uniform_store 2, i16, pair, 0, r2l_r2h, 48 318: c508203d03803000 uniform_store 2, i16, pair, 0, r1l_r1h, 50 320: c500403d03803000 uniform_store 2, i16, pair, 0, r0l_r0h, 52 328: 8800 stop Function: __lighting_fragment_agc.main ================================================================= 0: 4800c200 writeout 512, 3 4: 48200000 writeout 32, 0 8: 4915004af8fc8003 ld_tile r5_r6_r7_r8, srgba8, 0, 1, quad, 4, 255, 0, 0 10: 49250087f8fc8003 ld_tile r9_r10_r11_r12, s16norm, 0, 1, quad, 8, 255, 0, 0 18: aa8ad8020042 fadd32 r2h.cache, r12.discard, -0.0 1e: 1a824a821000 fmul32 r0h, r5, u4l 24: 1a844c821000 fmul32 r1l, r6, u4l 2a: 62080000 mov_imm r2l, 0 2e: 1a864e821000 fmul32 r1h, r7, u4l 34: e2000000 mov_imm r0l.cache, 0 38: 4229c5000300 if_fcmp r0l, nlt, r2h.discard, 1.0, 1 3e: 20c00c010000 jmp_exec_none 0x14A 44: 6184018000400000 TODO.iter r1l_r1h, perspective, cf1, cf0, 0, no_forward, pixel, no_centroid, 0b0 4c: 490d000619fc8003 ld_tile r3, u16norm, 1, 1, single, 0, 255, 0, 0 54: 6208003c mov_imm r2l, 15360 58: aa8ec6020002 fadd32 r3h.cache, r3.discard, -0.0 5e: 968c83704800 fmul16 r3l.cache, r1h.cache, r3h.cache.neg 64: 969082704800 fmul16 r4l.cache, r1l.cache, r3h.cache.neg 6a: a68686501000 fadd16 r1h.cache, r3l.cache, u2h 70: a68a88401000 fadd16 r2h.cache, r4l.cache, u2l 76: 968483300800 fmul16 r1l.cache, r1h.cache, r1h.cache 7c: a68206714800 fadd16 r0h.cache, u3l, r3h.cache.neg 82: b63085500882 fmadd16 r12l.cache, r2h.cache, r2h.cache, r1l.cache 88: 968487700800 fmul16 r1l.cache, r3h.cache, r3h.cache 8e: b630811008d8 fmadd16 r12l.cache, r0h.cache, r0h.cache, r12l.discard 94: b60488800882 fmadd16 r1l.cache, r4l.cache, r4l.cache, r1l.cache 9a: 8a30d890 rsqrt r12l.cache, r12l.discard 9e: b60486600882 fmadd16 r1l.cache, r3l.cache, r3l.cache, r1l.cache a4: 8a048290 rsqrt r1l.cache, r1l.cache a8: 968a98500c00 fmul16 r2h.cache, r12l.cache, r2h.discard ae: 968698300800 fmul16 r1h.cache, r12l.cache, r1h.cache b4: 9682d8100800 fmul16 r0h.cache, r12l.discard, r0h.cache ba: b63082800c85 fmadd16 r12l.cache, r1l.cache, r4l.discard, r2h.cache c0: b60c82600c83 fmadd16 r3l.cache, r1l.cache, r3l.discard, r1h.cache c6: b61082704c81 fmadd16 r4l.cache, r1l.cache, r3h.discard.neg, r0h.cache cc: 968486600800 fmul16 r1l.cache, r3l.cache, r3l.cache d2: 9a8e98202900 fmul32 r3h.cache, r12l.cache, r9.cache d8: b604d8800d82 fmadd16 r1l.cache, r12l.discard, r12l.discard, r1l.cache de: b60488800882 fmadd16 r1l.cache, r4l.cache, r4l.cache, r1l.cache e4: ba0ec86029c7 fmadd32 r3h.cache, r4l.discard, r11.cache, r3h.discard ea: 8a048290 rsqrt r1l.cache, r1l.cache ee: ba0cc64029c7 fmadd32 r3l.cache, r3l.discard, r10.cache, r3h.discard f4: 968482600c00 fmul16 r1l.cache, r1l.cache, r3l.discard fa: 9a8c83402d00 fmul32 r3l.cache, r1h.cache, r10.discard 100: 82048200008200f0 fcmpsel gtn, r1l.cache, r1l.cache, 0.0, r1l.cache, 0 108: 8a0582c0 log2 r1.cache, r1l.cache 10c: ba0ac5202dc6 fmadd32 r2h.cache, r2h.discard, r9.discard, r3l.discard 112: 9a8582a21000 fmul32 r1.cache, r1.cache, u5l 118: ba0a81602dc5 fmadd32 r2h.cache, r0h.cache, r11.discard, r2h.discard 11e: 8a0282d2 exp2 r0h.cache, r1.cache 122: 8204c50000c500f0 fcmpsel gtn, r1l.cache, r2h.discard, 0.0, r2h.discard, 0 12a: ba0281002d82 fmadd32 r0h.cache, r0h.cache, r8.discard, r1l.cache 130: b686819010080100 fmadd16 r1h.cache, r0h.cache, u4h, u4l 138: 1a8283a02c00 fmul32 r0h, r1h.cache, r5.discard 13e: 1a8483c02c00 fmul32 r1l, r1h.cache, r6.discard 144: 1a8643e02c00 fmul32 r1h, r1h, r7.discard 14a: 520e00000000 pop_exec r0l, 1 150: 5810 152: 0000 154: 480c0000 writeout 12, 0 158: 0902000af0fc8003 st_tile r0h_r1l_r1h_r2l, srgba8, 0, 0, quad, 0, 255, 0, 0 160: 8800 stop Function: __lighting_fragment_agc.main.constant_program ================================================================= 0: 0501000d00c87200 device_load 0, i32, triple, r0_r1_r2, u0_u1, 0, signed, lsl 2 8: 3800 wait 0 a: 2a80c0020002 fadd32 r0l, r0.discard, -0.0 10: 2a88c4020002 fadd32 r2l, r2.discard, -0.0 16: 2a82c2020002 fadd32 r0h, r1.discard, -0.0 1c: c500403d00803000 uniform_store 2, i16, pair, 0, r0l_r0h, 4 24: c510603d00801000 uniform_store 2, i16, single, 0, r2l, 6 2c: 8800 stop Function: __lighting_vertex_agc.main ================================================================= 0: 92048a02008d0190 icmpsel seq, r1l.cache, r5.cache, 0, u6h, 0 8: 92068a22008c0130 icmpsel ult, r1h.cache, r5.cache, 2, u6l, 0 10: 8e018a1a00000000 isub r0.cache, r5.cache, 1 18: 920eca22008d318c icmpsel seq, r3h.cache, r5.discard, 2, u6h, r1h.discard 20: 920cc01200c2c058 icmpsel ugt, r3l.cache, r0.discard, 1, r1l.discard, u6l 28: 2a8987000002 fadd32 r2, r3h.cache, -0.0 2e: 9a8184610800 fmul32 r0.cache, u2, r3l.cache 34: 9a8586610800 fmul32 r1.cache, u3, r3l.cache 3a: 3a80887108c00200 fmadd32 r0l, u4, r3h.cache, r0.discard 42: 3a828a710cc20200 fmadd32 r0h, u5, r3h.discard, r1.discard 4a: 2a85c6000002 fadd32 r1, r3l.discard, -0.0 50: 621100000000 mov_imm r4, 0 56: 620d0000803f mov_imm r3, 1065353216 5c: 11108280 st_var 1, r4, 2 60: 110c8380 st_var 1, r3, 3 64: 11088180 st_var 1, r2, 1 68: 11048080 st_var 1, r1, 0 6c: 91008480 st_var_final 1, r0, 4 70: 8800 stop ```