Gnurou / nouveau

Clone of http://cgit.freedesktop.org/~darktama/nouveau
10 stars 12 forks source link

[question] [Fermi] Is there a way to accumulate buffer offset after transform feedback (aka streamout) #15

Closed imirkin closed 8 years ago

imirkin commented 8 years ago

Something that GL supports is to do

enable TF draw 1 pause TF draw 2 resume TF draw 3

Now the idea is that things will get accumulated into the TF buffer from draw 1 and draw 3 but not draw 2. The way this is implemented in hardware, is that after draw 1 happens, there's a query you can run by writing to 3d method 0x1b0c:

0x0d005002 | (tfb buffer index << 5)

On Kepler, this does what one might hope -- it returns the full offset, i.e. the amount of buffer written + the tfb buffer offset (written to 3d method 0x290). However on Fermi it just overwrites that value. Which means that the offset retrieved from the query after draw 3 is complete only counts the quantity of bytes written by that draw alone, not including the offset.

Is there some bit of cleverness I'm missing to make this work on Fermi in a way that doesn't involve me waiting for draw 1 to complete before I configure the parameters for draw 3?

By the way, the problematic situation is triggered by the later cases of this dEQP test:

dEQP-GLES3.functional.transform_feedback.basic_types.interleaved.points.lowp_float

imirkin commented 8 years ago

After staring at the blob driver's traces, it seems to be doing the exact same things nouveau is, but is getting the correct results (or at least the test passes). Which means either there's something it does slightly differently which is causing the hardware to behave properly (I notice it turns TF on/off left and right and uses a short query rather than a long one, and has slightly different synchronize/etc behavior, although attempting to do the same in nouveau did not improve things), or it's some grctx setting which controls it.

So... is there some GR bit which makes that query accumulate on top of the existing buffer offset? Or any other advice on making this work properly?

imirkin commented 8 years ago

Based on Ben's suggestion, I set bit 0 of 0x50405c and it all magically started working. Looks like in rnndb, this was previously documented as

$ lookup -a gf100 0x50405c
PGRAPH.GPC[0].TPC[0].POLY.TFB_UNFUCKUP_OFFSET_QUERIES => 0

So I guess someone knew at some point :) But then it was forgotten.

Gnurou commented 8 years ago

Sorry for not having come with this answer before you found out. Well it wasn't trivial so not sure we would have thought of this. Closing this issue.