VerticalResearchGroup / miaow

An open source GPU based off of the AMD Southern Islands ISA.
BSD 3-Clause "New" or "Revised" License
1.07k stars 237 forks source link

Added support for inline constants in lsu instructions #15

Open d1duarte opened 8 years ago

zwabbit commented 8 years ago

Wait, what instruction ever does this?

d1duarte commented 8 years ago

For instance when a tbuffer_load set's the soffset to 0x80 (inline constant 0). I'm attaching a screenshot of this happening. The code is generated by AMD's CodeXL from the MatrixMultiplication example. snapshot1 In the right hand column you can see the instruction binary, on the tbuffer_loads all of them use the inline constant 0. Without this correction it will read register s0.

zwabbit commented 8 years ago

Ah okay, that situation. The problem actually is a bit more complicated, in that we never properly implemented support for ANY of the constant resources that are supposed to be accessible via scalar register addresses. Let me check through which documentation actually describes them and I'll get back to you. Or if you know which one it is, feel free to point me to it.

zwabbit commented 8 years ago

NVM, found it, table 6.1 in the SI ISA document.

d1duarte commented 8 years ago

Looking at the reg_field_encoder.v file (in the decode_core), it's easy to extract the integer constants from the address (as i proposed in this change). Extracting the floating point constants requires the use of the 'fp_constant' port but it's not required for LSU, since I don't think a floating point offset will ever be used

zwabbit commented 8 years ago

Oh sorry, github for some reason didn't send me a notification of your response so I didn't realize you had answered. Need to check these manually more often...

I've talked this over a bit with my prof and we're both of the opinion that the fix of this should take into account (or at least does not make difficult future work to support) the other possible inline constants as specified in table 6.1 in the ISA doc. I have a few ideas on how to approach it, which mostly boil down to having an input sorter module somewhere and feeding into the modules that actually perform the operation. The catch here is that it is not just memory instructions that might have inline constants, but VOP2 arithmetic instructions can as well. And also the need to keep the entire thing synchronized through the pipeline. Thoughts?

d1duarte commented 8 years ago

That's a good point. I would suggest one of these two approaches:

Since the constant 'address' will always have bit number 7 on (which means they start at 128). And, there only being 104 sgprs (bit number 7 off) you can either:

1) simply check for this bit in the decode stage, create a new flag 'constant_needed' which will later be used to call some new module that performs the conversion between 'address' and constant value and provides this value to the execution unit (either simf, simd, salu or lsu).

2) provide the complete address to the SGPR module and use a set of 'memory mapped' registers to provide this value. The procedure would not require great changes in the execution units (no new data_in inputs or any additional control).

I personally favor option 2 as this would require the least changes to the system and the access would be seamlessly to most units, providing more modularity. In decode you would only need to decide if the addressed source is a special purpose register (s0, exec, vcc, etc), a vector register (bit 8 = '1') or a 'sgpr' (bit 8 = '0', bit 7 = '1', and not a special purpose register).

Note: for Vector op's you would need to verify that bit number 8 is off as well.

zwabbit commented 8 years ago

Doing option 2 would work for the actual constant values, but the main issue right now is that the VCC and EXEC values are not actually maintained in the SGPR, I believe they are maintained in the wavepool. On the other hand they are routed as a matter of course into the SIMD/F units and at least for the latter also into the LSU automatically as part of the execution, so what we probably want is a combination of both options, option 2 for the actual constants that can be just address mapped in the SGPR and option 1 for the dynamic values maintained in other parts of the compute unit.

Do you think you'd be interested in working that out? I'm stuck working on some infrastructure related pieces and won't have time in the short term to work on MIAOW's internals.