David-Horner / text-format

0 stars 0 forks source link

rvv1 #1

Open David-Horner opened 4 years ago

David-Horner commented 4 years ago

Proposal to introduce register groups, SLEN and fractional SLEN into the simple register fractional LMUL model.

What has not changed: Fractional LMUL will

New, but fundamentally the same as for LMUL>1

Fractional groups (Striped groups of fraction registers). A striped group of fractional registers (a fractional group) parallels LMUL>1 registers, in that:

The rest of this proposal talks about what has changed (even if some subtly).

Some convenient definitions:

Define “SEW-instructions” as those that vs1, vs2 and vd match SEW from vtype. To clarify, they are not: widening or narrowing whole register moves mask register only

Introduce register group characterization: This proposal allows fractional groups to originate a multiple levels with their width determined by that level. For example fractional groups with a physical width of VLEN/8 originated at LMUL=1/8. A short hand to identify such groups will make the narrative much more readable.

Consider LMUL>=1 register groups. They all start in LMUL=1 via a widening operation. So 1 should be in their designation even though it is superfluous without fractional LMUL.

Consider n:m format where VLEN/n is the vector length and m is number of base-arch registers in the group. Then we designate

In the previously presented simple mappings of fractional LMUL, there was a presumptive understanding that widening operation sourced LMUL=1/n registers widen to LMUL=2*(1/n) registers.

This would be represented by a table such as this:

LMUL 1/8 1/4 1/2 1 2 4 8
------------
group type
1:8 x a=0,8,16,24
1:4 x a=0,4,8,12 ...
1:2 x a=0,2,4,6, ...
1:1 x a=all
2:1 x a=all
4:1 x a=all
8:1 a=all

a = Accessible at LMUL level by SEW instructions x = Created by widening instructions at LMUL level (Narrowing instructions also source from this LMUL)

Note: 16:1 is intentionally omitted from the diagram although it works the same.

This proposal acknowledges that such a simplistic approach can be inefficient for many reasonable implementations. It also acknowledges that some mandatory RVV instructions are comparably inefficient. vgather , slideup/down, and others similarly have to operate across lanes. And further that striped register support is already present in the base design.

So this proposal introduces striped groups beginning with table:

LMUL 1/16 1/8 1/4 1/2 1 2 4 8
------
group type
1:8 x a= 0,8, 16,24
1:4 x a= 0,4, 8,12 ...
1:2 x a=0,2, 4,6, ...
1:1 a=all
16:8 x a= 0,8, 16,24
16:4 x a= 0,4, 8,12 ...
16:2 x a=0, 2,4,6, ...
16:1 a= all
8:1 a= odd
4:1 a= odd
2:1 a= odd
LMUL **1/16 1/8 1/4 1/2 1 2 4 8

This is the same legend as above and will be assumed for all further diagrams: a = Accessible at LMUL level by SEW instructions x = Created by widening instructions at LMUL level (Narrowing instructions also source from this LMUL)

Note: 8:1 , 4:1 and 2:1 were added to the table though technically not required to illustrate fraction groups. More below.

This has two undesirable features. Both of which present trade-offs

LMUL now determines both the levels fractional size and the fractional group's size

The smallest fractional register size is used as the base for LMUL grouping

Although it is possible to provide an even wider LMUL or additional fields in vtype to facilitate more states to address these concerns, the approach here will be to enlist the register numbers to provided context information.

Fist note that at any level the register numbers used by register groups are specific. In LMUL>=2 the only operands available to any operation (including widening and narrowing) were register groups. Widening to 1:8 can only be performed with 1:4 inputs. Converse for narrowing. Widening to 16:8 must use 16:4 inputs to parallel that behaviour. Taking both these observation together the comparable behaviour constraint can be incorporated into the instruction decoding using register addresses.

This allows widening to originate at other levels concurrently, as diagramed here:

LMUL 1/16 1/8 1/4 1/2 1 2 ...
------
group type
1:2 x a=0,2, 4,6, ...
1:1 a=all
16:8 x a= 0,8, 16,24
16:4 x a= 0,4, 8,12 ...
16:2 x a=0, 2,4,6, ...
16:1 a=all
8:8 x
8:4 x a= 4,12, 18,20, ...
8:2 x a= 2,6, 10,14, ...
8:1 a= odd
4:4 x
4:2 x a= 2,6, 10,14, ...
4:1 a= odd
2:2 x
2:1 a= odd
LMUL **1/16 1/8 1/4 1/2 1 2

Note: I dropped LMUL=4 and 8 only from the illustration. Note: 16:8 is addressable (from LMUL=1/2), but 8;8, 4:4 and 2:2 are not addressable from LMUL=1. They are however addressable from widening and narrowing instructions from LMUL=1/2.

To be continued ......