chipsalliance / rocket-chip

Rocket Chip Generator
Other
3.19k stars 1.12k forks source link

Add illegal instruction detection to RVC decoder #3613

Closed chenguokai closed 5 months ago

chenguokai commented 5 months ago

Related issue: In OpenXiangShan repo, this PR included some rocket-chip modifications that may contribute to upstream.

Type of change: feature request

Impact: API addition (no impact on existing code)

Development Phase: implementation

Release Notes

Add explicit illegal RVC instruction detection logic to RVC decoder, enabling a easy implementation for Sstvala

linux-foundation-easycla[bot] commented 5 months ago

CLA Signed

The committers listed above are authorized under a signed CLA.

jerryz123 commented 5 months ago

Can you describe how these bits were determined? I notice this doesn't check for the supported ISA subset. Are there cases where the RVCDecoder would report a illegal instruction, but not the full decoder, or vice-versa?

What is the benefit/use case of this as well? Is there really a substantial improvement over the QMC-based decode of the expanded instruction?

chenguokai commented 5 months ago

Can you describe how these bits were determined? I notice this doesn't check for the supported ISA subset.

I determined these bits according to RISC-V spec ver 20191213. I will check newer versions and update this PR if necessary. Thanks for the reminder.

What is the benefit/use case of this as well? Is there really a substantial improvement over the QMC-based decode of the expanded instruction?

The original decoder does ensure that one illegal instruction is still decoded as illegal instruction after expansion. The benefit comes mainly from adapting to newer specifications. Sstvala requires illegal instructions be written to stval. It would be more feasible if illegal RVC instructions are identified earlier such that only first faulting compressed instruction should be kept. Otherwise the hardware has to keep all compressed instructions in case one of them is illegal, at least before decode stage.

jerryz123 commented 5 months ago

Are there cases where the RVCDecoder would report a illegal instruction, but not the full decoder, or vice-versa?

I think this is the crux of the issue for me. What happens if I use the RVCDecoder in a RV32 implementation? Will this detect that C.LD is illegal? Or C.FLW in a no-F implementation? The current implementation is not parameterized by ISA subset.

if illegal RVC instructions are identified earlier

I agree the challenge is identifying illegal RVC instructions earlier. But what is the relative cost of this approach, vs just doing the expand+decode-only-illegal early in the pipeline? Does that combinational logic synthesize something that is fundamentally more complex than your explicitly defined illegal-decoder?

chenguokai commented 5 months ago

But what is the relative cost of this approach, vs just doing the expand+decode-only-illegal early in the pipeline? Does that combinational logic synthesize something that is fundamentally more complex than your explicitly defined illegal-decoder?

This PR originally target XiangShan which utilizes this decoder. In our case, the instructions are passed through instruction buffer before they get decoded. We will have to add 16 bit to each buffer entry if we do not want to do major modification to the pipeline. Also it seems to be absurd to refactor our pipeline for this minor feature. Overall, delaying illegal RVC detection will bring at least 1(pipeline stage)16(fetch width)16(RVC inst width) + 48(ibuffer size)*16(RVC inst width) bit additional storage cost for our implementation.

It should be noted that our cost model may not apply for rocket-chip. I am not quite familiar with cost in your case. If the cost for rocket is minor, maybe we can just keep this modification in XiangShan branch only.

jerryz123 commented 5 months ago

I am not suggesting you store the expanded bits early in the frontend. You can expand+decode-illegal, then throw away the expanded bits to minimize instruction buffer size, before re-expanding later in the pipeline.

I am asking specifically about the combinational logic cost of expand+decode-illegal, vs decode-rvc-illegal. If expand+decode-illegal reduces to combinational logic with similar depth/area as direct-decode-rvc-illegal, then there really isn't much of a need for direct-decode-rvc-illegal.

I would prefer you upstream this if the use case is justified, although my comments on the correctness of this for non RV64GC implementations still stands.

chenguokai commented 5 months ago

I am asking specifically about the combinational logic cost of expand+decode-illegal, vs decode-rvc-illegal. If expand+decode-illegal reduces to combinational logic with similar depth/area as direct-decode-rvc-illegal, then there really isn't much of a need for direct-decode-rvc-illegal.

For combinational logic only, direct-decode method will obviously introduce additional cost, as the logic deals with a strict subset of main decoder(or possibly a dedicated decode-illegal unit). But other costs should not be ignored. I have no timing report for rocket, but for XiangShan, the RVC expander lies in a timing critical path, from fetch unit to instruction buffer, so your proposal will introduce either storage cost(by appending new pipeline stages) or timing cost(by prolonging critical path).

although my comments on the correctness of this for non RV64GC implementations still stands.

As I stated previously, I will update this PR if necessary. But the necessity for mainline rocket-chip remains to be discussed.

jerryz123 commented 5 months ago

Thanks, I had not considered that the expand+decode might generate additional logic for instructions which have no compressed forms (I assume this is what you mean by "as the logic deals with a strict subset of main decoder"). Based on that, I am more inclined to believe that the direct-RVC-decode-illegal you have implemented has lower cost.

As I stated previously, I will update this PR if necessary. But the necessity for mainline rocket-chip remains to be discussed.

I think this change should go in mainline then, after it has been parameterized by the ISA subset.