Open long-long-float opened 4 years ago
There is already some related code there. This was added some while ago to do a similar job, but I am not sure whether it is still applied. Anyway, that might be a good point to start.
@doe300 I have a question. Is there a way to find the instruction corresponded the local (for example, I want to get the instruction %sub1 = sub i32 %mul, %width
from the value i32 %sub1
). Or should I create this method?
%mul = mul nsw i32 %y.024, %width
%sub1 = sub i32 %mul, %width
%add = add i32 %mul, %width
%call = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %sub1, i8 addrspace(1)* %in) #2
%call2 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %mul, i8 addrspace(1)* %in) #2
%call3 = tail call spir_func <16 x i8> @_Z7vload16jPU3AS1Kh(i32 %add, i8 addrspace(1)* %in) #2
In general, you can query Local#getUsers(LocalUse::Type::WRITER)
to get all writers.
If there is just one writer, Local#getSingleWriter()
will do the trick. Also if you have a Value
instead of the local, you can call Value#getSingleWriter()
which does the same, but checks whether the value is a local. Of course the result needs to be checked for nullptr
in both cases!
@doe300 I want to insert the instruction (extends IntermediateInstruction
) which do VPM load here, but I cannot find it.
Is there such the instruction, or should I create the instruction?
The general memory access (before we know whether the memory area is lowered to a register, the VPM or accessed via TMU or DMA) is represented as MemoryInstruction
.
After the lowering, there are no specific instruction types for the various lowered types (e.g. register, VPM), instead the MemoryInstruction
is directly composed to the (hardware) instructions executed to do the memory accesses.
So if you want to insert a VPM access, have a look at the VPM header:
insertReadDMA
, insertWriteDMA
for "direct" DMA access (QPU <-> RAM), abstracting away the VPMVPM::insertReadVPM
, VPM::insertWriteVPM
for VPM access (QPU <-> VPM), e.g. also for caching/exchanging data between QPUsVPM::insertReadRAM
, VPM::insertWriteRAM
for DMA only access (VPM <-> RAM), e.g. to read/write back cached dataThe VPM object required can be retrieved via the Method::vpm
member.
Does this information suffice or do you need a special instruction type to represent VPM accesses (e.g. for further processing)?
I understand, thanks.
When we compile following OpenCL code which calls
vload16
three times withvc4c --asm -O3 -o dma_loads.asm dma_loads.cl
, VC4C outputs the following assembly(dma_loads.txt
). This contains three DMA loads, but these can be combined into one DMA load.dma_loads.txt
I want to implement the combiner and think the method.
At each block in CFG and LLVM IR
vload16
(actually_Z7vload16jPU3AS1Kh
).vload16
.I think the checking regular intervals is challenging. The symbolic execution can be used.
Example
Collect
vload16
(and address variables)Addresses
%mul - %width
%mul
%mul + %width
These are regular intervals (
%width
), then these are combined (I should create new functiondma_load
andvpm_load
).