trenouf commented 4 years ago

LGC: shader compilation proposal

There are several different efforts to move away from whole-pipeline compilation in LLPC, or that will affect LLPC in the future. This proposal is to unify them in new LGC (LLPC middle-end) functionality.

There is a "partial pipeline compilation" scheme in LLPC that kind of hacks into LGC's otherwise whole-pipeline compilation, and does ELF linking in the front-end using ad-hoc ELF reading and writing code, rather than LLVM code.
Steven et al have started work on their scheme to be able to compile separate shaders (VS, FS, CS) offline to pre-populate a shader cache, with some pipeline state missing, and some pipeline state guessed with multiple combinations per shader. This builds on the front-end linking functionality above. See Github issues Cache creator tool, Relocatable elf vertex input handling, Handling descriptor offsets as relocations.
There are AMD-internal discussions about shader compilation.

This proposal is to unify these different efforts to use new LGC (LLPC middle-end) functionality. The link stage in particular requires knowledge that should be in the middle-end, such as the workings of PAL metadata, and ELF reading and writing, and needs to be shared and used by potential multiple LLPC front-ends.

Background

Existing whole pipeline compilation

Whole-pipeline compilation in LLPC works like this:

For each shader, run the front-end shader compilation: SPIR-V reader and various "lowering" passes use Builder to construct the IR for a shader. This phase does not use pipeline state.
LGC (the middle-end) is given the pipeline state, and it links the shader IR modules into a pipeline IR module.
LGC runs its middle-end passes and optimizations, then passes the resulting pipeline IR module to the AMDGPU back-end for pipeline ELF generation.

Existing(ish) shader and partial pipeline caching

Existing partial pipeline compilation

There are some changes on top of this to handle a "partial pipeline compilation" mode. Part way through step 2, LGC calls a callback provided by the front-end with a hash of each shader and the pipeline state and input/output info pertaining to it. The callback in the front-end can ask to omit a shader stage, if it finds it already has a cached ELF containing that shader. Then, the front-end has a post-compilation ELF linking step to use the part of that cached ELF for the omitted shader. This only works for VS-FS, and has some other provisos, because of the way that it plucks the part of the pipeline it needs out of a whole pipeline ELF.

This scheme has some disadvantages, especially the way that it allows the middle-end to think that it is compiling a whole pipeline, but it then post-processes the ELF to extract the part it needs. A more holistic approach would be for the middle-end to know that it is not compiling a whole pipeline, and for the link stage to be in the middle-end where knowledge of (for example) PAL metadata should be confined to.

Steven et al's shader caching

Steven's scheme is to offline compile shaders to pre-populate a shader cache. This would involve compiling a shader with most of the pipeline state missing (principally resource descriptor layout, vertex buffer info and color export info), and with some "bounded" items in the pipeline state set to a guessed value. The resulting compiled shader ELF would be cached keyed on the input SPIR-V and (I assume) the "bounded" parts of the pipeline state that were set.

The proposal

This proposal outlines a shader compilation scheme using relocs, prologs and epilogs, and a pipeline linking stage, all handled in LGC (the LLPC middle-end).

Shader compilation vs pipeline compilation

This proposal does not cover how and when a driver decides to do shader compilation. Of the two compilation modes:

shader compilation and caching with pipeline linking for minimized compile time;
full pipeline compilation for optimal code;

there is scope for API and/or driver changes to use shader compilation first, then kick off a background thread to do the optimized compilation and swap the result in at the next opportunity.

Early vs late shader caching

We can divide existing and proposed shader caching schemes into two types:

Early shader caching caches the shader keyed on just its input language (SPIR-V for Vulkan), possibly combined with some of the pipeline state. Steven's scheme is an example.
Late shader caching caches the shader after some part of the compilation has taken place, and keys it on the state of the compilation at that point. The existing partial pipeline compilation scheme is an example.

I propose to focus here on early shader caching, which has the following pros and cons:

Pro: Minimize compilation time for cache-hit case
Pro: Fits in with Steven's scheme
Con: Only limited VS-FS optimization possible (although even late shader caching still has some limits on this, unless you make it so late you have done a large chunk of the compilation).

Nicolai also suggests taking the existing partial pipeline compilation scheme, a late shader caching scheme, and tidying up its interface and implementation (see Inter-shader data cache tracking. One problem is that we do pretty much have to choose one or the other; within one application run, you can't use both at the same time, as trying to means that a shader gets cached early and late, and next time the same shader is seen, the early cache check always succeeds.

The choice partly depends on how you view the existing partial pipeline compilation scheme: was a late shader caching scheme chosen for the possibility of VS-FS optimizations, or was it chosen because that meant that it could be implemented without implementing the relocs and prologs and epilogs in this proposal? I suspect the latter, and I reckon we're better with an early shader caching scheme for the two pros I list above.

What shaders are cached

This proposal makes no attempt to cache VS, TCS, TES, GS shaders that make up part of a geometry or tessellation vertex-processing stage. The FS in such a pipeline can still be cached though. So the shader types that can be cached are:

CS
VS as long as it is standalone (if a VS accidentally gets compiled and cached and turns out not to be standalone, it just gets ignored, but hopefully you can tell that a VS is likely not to be standalone before getting to that point)
FS

In addition, we can compile the whole vertex-processing stage (VS-GS, VS-TCS-TES, or VS-TCS-TES-GS) without the FS, or with an already-compiled FS.

Failure of shader compilation or pipeline linking

There needs to be scope for shader compilation or pipeline linking to fail, in which case the front-end needs to do full pipeline compilation instead:

Shader compilation can fail if the compiler can tell in advance that the shader does something that will not work in the shader compilation model, for example a VS that is obviously not a standalone VS.
Pipeline linking can fail because the pipeline uses something that is not possible to implement in this model, for example:
- converting sampler
- specialization constant that shader compilation did not render as a reloc (used in a type) and whose value does not match the shader default
- descriptor set split between its own table and the top-level table Also it needs to fail because the pipeline uses something that has not yet been implemented in this model.

This kind of failure is different to normal compilation failure, in that it needs to exit cleanly and clean up, because the driver or front-end is going to retry as a full pipeline compilation. If any such condition is detected in an LLVM pass flow, we need to come up with a clean exit mechanism, such as deleting all the code in the module and detecting that at the end.

Prologs and epilogs

Compiling shaders with some or all pipeline state missing and without the other shader to refer to means that the pipeline linker needs to generate prologs and epilogs.

CS prolog

If the compilation of a CS without resource descriptor layout puts its user data sgprs in the wrong order for the layout in the pipeline state, then the linker needs to generate a CS prolog that loads and/or swaps around user data sgprs. The linker picks up the descriptor set to sgpr mapping that the CS compilation used from the user data registers in the PAL metadata.

VS prolog

If vertex buffer information is unavailable at VS compile time, then the linker needs to generate a VS prolog (a "fetch shader") that loads vertex buffer values required by the VS. The VS expects the values to be passed in vgprs, and the linker picks up details of which vertex buffer locations and in what format from extra pre-link metadata attached to the VS ELF.

VS epilog

If the VS (or whole vertex-processing stage) is compiled without information on how the FS packs its parameter inputs, then the VS compilation does not know how to export parameters, and the linker needs to generate a VS epilog. The VS (or last vertex-processing-stage shader) exits with the parameter values in vgprs, and the VS epilog takes those and exports them. The linker picks up information on what parameter locations are in which vgprs and in what format from extra pre-link metadata attached to the VS ELF, and information on how parameter locations are packed and arranged from extra pre-link metadata attached to the FS ELF.

No FS prolog

No FS prolog is ever needed. FS compilation decides how to pack and arrange its input parameters.

FS epilog

If the FS is compiled without color export pipeline state, then it does not know how to do its exports, and the linker needs to generate an FS epilog. The FS exits with its color export values in vgprs (and the exec mask set to the surviving pixels after kills/demotes), and the FS epilog takes those and exports them. The linker picks up information on what color exports are in which vgprs and in what format from extra pre-link metadata attached to the FS ELF.

Prolog/epilog compilation notes

A prolog has the same input registers as the shader it will be attached to, minus the vgprs that are generated by the prolog for passing to the shader proper. That is, the shader's SPI register settings that determine what registers are set up at wave dispatch apply to the prolog.

For a VS prolog where the VS is part of a merged shader (including the NGG case), the code to set exec needs to be in the prolog.

The exact same set of registers are also outputs from the prolog, plus the vgprs that are generated by the prolog.

A prolog/epilog is generated as an IR module then compiled. The compiled ELF is cached with the hash of the inputs to the prolog/epilog IR generator being the key.

In the context of a prolog being generated as IR then compiled:

Input args represent the input registers, with sgprs marked as "inreg", same as the IR for a shader.
IR can only have a single return value, which here is a struct containing the preserved input sgprs and vgprs, plus the vgprs generated by the prolog for passing to the shader. By including sgprs as ints and vgprs as floats in the return value struct, the back-end calling convention ensures that they are allocated to sgprs and vgprs appropriately.
We can assume that compiling a prolog will never need scratch, so with that single "shader prolog/epilog" calling convention, we don't need to worry that it doesn't know how to find the scratch descriptor (which is different between compute, single shader and merged shader including NGG).
Compiling the prolog with that "shader prolog/epilog" calling convention leaves its sgpr and vgpr usage in some well-known place, e.g. the SPI_SHADER_RSRC1_VS register in PAL metadata. The linker needs to take the maximum usage of that and the shader proper.

An epilog's input registers are the same as the shader's output registers, which is the vgprs containing the values to export. (This may need to change to also have some sgprs passed for VS epilog parameter export on gfx11, if parameter exports are going to be replaced by normal off-chip memory writes.)

Prolog/epilog generation even in pipeline compilation

In a case where a particular prolog or epilog is not needed (e.g. the VS prolog when vertex buffer information is available at VS compilation time), I propose that LGC internally uses the same scheme of setting up a shader as if it is going to use the prolog/epilog (including setting up the metadata for the linker), and then uses the same code to generate the IR for the prolog/epilog as would otherwise be used at link time. Then it would merge the prolog/epilog into the shader at the IR stage, allowing optimizations from there.

The advantage of that is that there is less different code in LGC between the shader and pipeline compilation cases.

A change this causes is that the vertex buffer loads are all at the start of the VS, even in a pipeline compilation. I'm not sure whether that is good, bad or neutral for performance. (Ignoring the NGG culling issue for now.)

NGG culling

An early version of this feature should probably just ignore this case, because it is quite complex.

With NGG culling, it is advantageous to delay vertex buffer loads that are only used for parameter calculations until after the culling. Thus, for an NGG VS, there should be two VS prologs (fetch shaders). The VS compilation needs to generate the post-culling part as a separate shader, such that the second fetch shader can be glued in between them. At that point (the exit of the first shader), sgprs and vgprs need to be as at wave dispatch, except that the vgprs (vertex index etc) have been copied through LDS to account for the vertices being compacted. Also exec needs to reflect the compacted vertices.

Jumping between prolog, shader and epilog

I'm not sure how possible this is, or if there is a better idea, but:

We want the generated code to reflect that it is going to jump to the next part of the shader. So, when generating the prolog, or when generating the shader proper when there will be an epilog, we want to have an s_branch with a reloc, rather than an s_endpgm. Perhaps we could tell the backend that by defining a new function attribute giving the symbol name to s_branch to when generating what would otherwise be an s_endpgm.

Linking a prolog, shader and epilog would then just work with the s_branch. Linking could optimize that by ensuring the chunks of code are glued together in the right order, and removing a final s_branch. Alignment is a consideration:

The start of the glued-together shader must be a multiple of 256.
The main part of the shader should start cache-line-aligned, so anything the compiler has done to align loop starts etc remains valid.
Padding could be done by adding s_nops, except that any final s_waitcnts should be moved to after the s_nops as an optimization.

The LGC interface

I propose that we extend LGC (LLPC middle-end) to handle the various requirements.

Currently LGC has an interface that says:

Here are the IR modules for the shaders and the pipeline state; link into a pipeline IR module.
Go and run middle-end and back-end passes to generate a pipeline ELF.

That interface needs to be extended to allow compilation of a shader with missing or incomplete pipeline state, and to allow linking of previously-compiled shader ELFs and pipeline state.

We would probably want to implement compilation of a geometry and/or tessellation pipeline by providing LGC with IR modules for non-FS shaders, a previously-compiled shader ELF for the FS, and the pipeline state. That allows the other shaders to be compiled knowing which attribute exports will be unused by the FS so can be removed.

Compilation modes

The compilation modes LGC would support (in probable order of implementation priority) are:

Pipeline compilation, as now. Must be provided with full pipeline state. Generates a pipeline ELF satisfying the PAL pipeline ELF spec.
Compilation of a single shader with missing or partial pipeline state. The shader must be CS, FS, or VS in a non-tessellation non-geometry pipeline. For VS or FS, this may or may not be provided with the other shader already compiled, which would provide parameter information. Generates an ELF that needs to be pipeline linked. Then there is a link stage in LGC that takes such ELFs and generates a pipeline ELF satisfying the PAL pipeline ELF spec.
Compilation of the vertex-processing part of a geometry or tessellation pipeline, with full pipeline state. This may or may not be provided with the already-compiled FS ELF, which would supply parameter layout information. Generates an ELF that needs to be pipeline linked.

Note that the above modes do not include any case where a shader is compiled separately, and then in the link stage needs to be combined with another shader to create a merged shader or an NGG prim shader.

Tuning options

As proposed by Rob, tuning options should always be made available at shader compilation time. This does probably mean that all tuning has to be done by shader, not pipeline. Most tuning options are per-shader anyway, except the NGG ones, which obviously apply only to the VS in a VS-FS pipeline.

Use of the LGC interface by the front-end

VS-FS parameter optimization

As pointed out by Nicolai, the use of early shader caching limits the parameter optimizations that can be done between VS and FS, and how that is limited depends on whether you compile the VS first or the FS first. I consider that it is worth taking this hit because of the saving in compile time in the cache-hit case.

FS first

In this scheme, at VS compilation time, we know exactly how parameters are packed by the FS, so we can generate the parameter exports and we do not need a VS epilog. We can also see where the FS does not use a parameter at all, and DCE it and its calculation in the VS. However we cannot do constant parameter propagation into the FS.

VS first

In this scheme, VS compilation does not know how parameters will be laid out by the FS, so we need a VS epilog. This does allow constant parameter propagation into the FS, because the VS's parameter metadata can include an indication that a parameter is a constant so is not being returned in a vgpr at all. FS compilation will see this metadata, and propagate the constant into the FS, saving an export/import. (Note that LLPC doesn't do this at all currently.) However, the dead parameter (one not used by the FS) optimization is limited to the VS epilog spotting it does not need to export it. The calculation of the dead parameter, and any vertex buffer load needed only for that, does not get DCEd.

Other VS-FS parameter optimizations we miss out on

Here are some examples of potential optimizations Nicolai mentioned that we miss out on by using early shader caching:

A transform that lifts certain instructions, such as "multiply parameter by a constant" to the vertex shader.
A transform that lifts uniformity backwards, e.g. if there is information (such as an annotation) in the fragment shader that proves that a parameter must be uniform, that information could be back-propagated into the vertex shader.
A transform that propagates range / scalar evolution information ("this parameter is always an integer between 0 and 10")

All these are possible when doing a full pipeline compile.

LLPC front-end changes

The LLPC interface would need to change so that a partial pipeline state (and tuning options) is provided to the shader compile function. That function would then check the shader cache, and, if a compile is needed, do front-end compilation then call the LGC interface with the partial pipeline state.

The pipeline compile function would check the cache for its shaders or partial pipeline. The difficulty here is that it does not know how much of the pipeline state was known at shader compile time, so there may need to be some mechanism for multiple shader ELFs to be stored for a particular shader in the cache, with a way of finding one whose known pipeline state at the time is compatible.

amdllpc

Steven proposes using a modified amdllpc as his offline shader compile tool. Thus, that will be calling the LLPC shader compile function with an incomplete pipeline state containing values for the "bounded" items.

The proposed un-pipeline-linked ELF module

Such an ELF is the result of anything other than full pipeline compilation. It contains various things to represent the parts of the pipeline state or inter-shader-stage linking information that was unavailable at the time it was compiled.

Representation of metadata needed for linking

Some of the items below list metadata that needs to be left in the unlinked ELF for the link stage to read. I propose that we will define a new section in the PAL metadata msgpack tree to put these in. The link stage will remove that metadata.

Representation of final PAL metadata

Some parts of the PAL metadata can be directly generated in a shader compile before linking. Hopefully all the link stage needs to do is merge the two msgpack trees, ORing together any register that appears in both. That handles the case that the same register has a part used by VS and a part used by FS.

Resource descriptor layout

If resource descriptor layout was unavailable at shader compile time, then the load of a descriptor from its descriptor table has a reloc on its offset where the symbol name gives the descriptor set and binding. Such relocs are resolved at link time, when the resource descriptor layout pipeline state is available. This work is already underway by Steven from Gibraltar.

In addition, an array of image or sampler descriptors needs a reloc for the array stride. That is different depending on whether it is actually an array of combined image+samplers, and you can't tell at shader compile time.

For a descriptor set pointer that can fit into a user data sgpr, the PAL metadata register for that user data sgpr contains the descriptor set number. The link stage updates that to give the spill table offset. Work on this mechanism is underway by David Zhou in AMD (although in the context of the front-end ELF linking mechanism). There needs to be some way of telling whether the PAL metadata register represents a fully-linked spill table offset, or an unlinked descriptor set number. I believe David's work already does that.

For a descriptor set pointer that cannot fit into a user data sgpr, it is loaded from the spill table with a reloc on the offset whose symbol gives the descriptor set. That reloc is resolved at link time.

We will have to ban the driver putting any descriptors into the top level of the descriptor layout:

Currently, if a descriptor set contains both dynamic and non-dynamic descriptors, the driver puts the dynamic ones in the top level. This proposal would not be able to find them.
Banning that also avoids the use of compact descriptors, which we also cannot cope with in this proposal.

A compute shader's user data has a restriction on which spill table entries can be put into user data sgprs, and in what order. For that reason, the link stage may need to prepend code to load and/or swap around sgprs for descriptor set pointers.

Vertex inputs

If vertex input information is unavailable at VS compile time, then vertex inputs are passed into the vertex shader in vgprs, with metadata saying which inputs they are and what type. The link stage then constructs a "fetch shader", and glues it on to the front of the shader.

The fetch shader has an ABI where the vertex shader's input registers are also the fetch shader's inputs and outputs, except that the vertex input values are obviously not part of the fetch shader's inputs.

Color exports

If color export information is unavailable at FS compile time, then color exports are passed out of the fragment shader in vgprs, with metadata saying which exports they are and what type. The link stage then constructs an FS epilog, and glues it on to the back of the shader. The shader exits with exec set to pixels that are not killed/demoted.

The following pipeline state items also affect color export code, so the absence of any of them also forces the use of an FS epilog:

alphaToCoverageEnable
dualSourceBlendEnable

Parameter exports and attribute inputs

In a shader compile, parameter exports are passed out of the last stage vertex-processing shader in vgprs, with metadata saying which parameters they are. In an unlinked fragment shader, attributes are packed and there is metadata saying how that is done. The link stage then ties them up, and adds an epilog to the last stage vertex-processing stage.

enableMultiView

enableMultiView has several impacts:

What gl_Layer and gl_ViewIndex actually are
Whether and what to export as pos1

It looks like the best way of handling this if enableMultiView is unavailable at VS compile time is to compile the two alternatives for each thing inside an if..else..endif with a reloc as the condition.

perSampleShading

If the perSampleShading item is unavailable at FS compile time, and the FS uses gl_SampleMask or gl_PointCoord, then the compiler needs to generate code for both alternatives inside an if..else..endif where the condition is a reloc.

PAL metadata items

Certain pipeline state items do not affect compilation except for being copied straight into PAL metadata registers:

depthClipEnable
rasterizerDiscardEnable
topology
userClipPlaneMask

In a shader compile with a link stage, it is the link stage that copies these items into PAL metadata.

Relocatable items

As pointed out by Steven's document pipeline state - Sheet1 (1).pdf, the following items are relocatable. That is, if the item is unavailable in pipeline state at shader compile time, a simple 32-bit constant load with a reloc will work, so it can be resolved at link time:

deviceIndex
numSamples
samplePatternIdx

We should probably add the shadow descriptor table high 32 bits to this too.

Specialization constants

Steven's document claims that SPIR-V specialization constants can be handled by relocs. That is only partly true:

Where a specialization constant is used somewhere a reloc can be used (an operand to an instruction in function code), then the SPIR-V reader could call a new Builder function "get reloc value". The name of the symbol referenced by the reloc is private to the SPIR-V LLPC front-end, and is not understood by LGC.
Where a specialization constant is used somewhere a reloc cannot be used (e.g. the size of an array type), then the SPIR-V reader uses the default value for that constant, and it somehow needs to record what value it used so the linker can later check that the specialization constants supplied with the pipeline do not clash with that. If they do clash, then the link fails and the front-end needs to start again compiling that shader.
At the link stage, the front-end needs to supply a list of symbol,value pairs to the linker to satisfy the relocs. I'm not sure whether it is worth encapsulating that in an ELF.

Bounded items that we need to make relocatable

These are pipeline state items that Steven's document lists as "bounded", that is, there is a limited range of values that each one can take. Gibraltar's proposal to handle this in their offline shader cache populating scheme is to compile a shader multiple times with these items set to the most popular values, in the hope of covering most cases that the shader is used in a pipeline.

The implication of this is that the shader cache needs to be able to keep multiple ELFs for the same shader, with different assumptions about these pipeline state items. When a pipeline compile looks for a cached shader, there needs to be some mechanism where it can find the one with a compatible state for these items.

However, for the purposes of app runtime shader compilation, we need to find some way of making these fixuppable by the link stage. In some cases, that might involve generating code that can handle all possibilities, and then having a branch with a reloc to select the required alternative.

perSampleShading

NGG control items

These items are supplied to the compiler through pipeline state to save needing to load them at runtime from the primitive shader table. If they are unavailable at shader compile time, then the compiler is forced to load from the primitive shader table.

cullMode
depthBiasEnable
frontFace

These items are similar, except certain settings also need to force NGG pass-through mode. Therefore, if the items are unavailable at shader compile time, we need to force NGG pass-through mode.

polygonMode, except that setting polygonMode to line or point forces NGG pass-through mode

Items only needed for tessellation or geometry

These pipeline state items are only used for tessellation or geometry. Because this proposal insists that a vertex-processing half-pipeline with tessellation or geometry has to be compiled with full pipeline state, these items do not need to be handled by a reloc:

patchControlPoints
switchWinding

The link stage

The link stage needs to:

generate CS or VS prolog;
generate VS epilog;
generate FS epilog;
merge and patch up PAL metadata;
glue prolog and epilogs on to the corresponding shader;
apply relocs;
assemble the pipeline ELF.

A prolog is generated to end with an s_branch with a reloc to branch to the VS.

Where an FS needs an epilog (color export information was unavailable at shader compile time), it is generated with an s_branch with a reloc instead of an s_endpgm, to branch to its epilog code.

In both cases, we can optimize by gluing sections in the right order, and applying the optimization that a chunk of code that ends with an s_branch can have the s_branch removed and turned into a fallthrough. There may need to be special handling for a prolog to ensure that the CS or VS remains instruction-cache-line-aligned, such as inserting s_nop padding before the fetch shader.

Prologs will be generated as IR then compiled. They will be cached so that will not happen very often.

s-perron commented 4 years ago

This looks good. Thanks.

kuhar commented 4 years ago

As bystander, I really appreciate your summary, Tim. It's great you gave this are more structure and provided a high level overview of the design space -- usually a few folks would just come in with some corner-case in the design that they are aware of and it was very difficult for me connect the dots when that happened. Many things are much more clear to me now, although I still don't understand the details.

trenouf commented 4 years ago

I have opened #545 LGC shader compilation interface proposal, to detail how the front-end would call LGC (the middle-end) to do shader compilation and linking.

trenouf commented 4 years ago

Now that I have pushed #720 fetch shader for review, here are some ideas on how to go about implementing the color export shader:

Analogous to "New vertex fetch pass" in #720, handle color exports in a similar way: use a new lgc.output.export.color call (instead of lgc.output.export.generic) for writing to a color export in InOutBuilder, and add a new pass into the existing FragColorExport.cpp that runs before PatchEntryPointMutate to lower the lgc.output.export.color calls to export intrinsics.
In that new pass, spot that it is an unlinked compile and no color export info was provided. In that case, write the info from the color export calls to metadata, mutate the shader to return a struct containing the export values, and hook up the return value elements to the inputs to the color export calls. The FS is then an "exportless" FS.
In the linker, spot that it is an "exportless" FS (perhaps by the presence of the metadata), and create a color export shader (new subclass of GlueShader), analogous to a fetch shader. Actually it is quite a bit simpler than a fetch shader, because it does not need to ask PalMetadata to tell it how many sgprs and vgprs it has on entry, or where any entry register is.

s-perron commented 4 years ago

That is exactly what I was thinking. Thanks.

GPUOpen-Drivers / llpc