Closed hyunjongL closed 1 year ago
So the memory address the master quadrant is using is different with the memory configuration in page382 of user guide, not only that it points to 32-bit words per but also all the write addresses are continuous? In the figure, the SRAM addresses are apart from each other. @rotx-maxim
Thanks!
The user guide map is an abbreviation of the memory map. Addresses in the map are represented as 32-bit, but since the memories are not byte addressable, the bottom bits are not used. When programming the accelerator, the bottom bits are not programmed and neither is the global address offset (i.e., the accelerator gets native word addresses). The memory space is contiguous in this case, but it's not guaranteed for future devices. There may be "holes" in the address space. However, the accelerator always knows where to write the "next" word. Check docs/AHBAddresses.md for another view of the addresses.
I figured out that the write pointer does not jump if there is any unused processors within the allocation. (0x800...001 takes a lot of time compared to 0x000...00F)
0 - Does the writer write 32bits for the four channels even if only one processor will be used? 0.2 - If the writer writes overwrite the other 24bits to zero if the above happens? 1 - Let's say for the first quadrant I assign 8 processors 0x0F0F. Then the group in-between will have zeros in their memory and the memory that is not inbetween (the first zero) will not be overwritten. 2 - Okay I guess we have to make sure to use consecutive processors always.
I closed the other issue, so will just ask it here.
One thing I find strange is that it takes different time to write to these output_processors. 0x0011, 0x0021, 0x0041, 0x0081 If the memory writes are performed 32 bits each, I think they should take the same time, however, 0x0011 takes the least time and 0x0081 takes the most.
On the other hand, 0x0011, 0x0012, 0x0014, and 0x0018 take the same time, which I do feel the memory writes are performed by 32 bits each.
Are there any exceptions for the group with the highest index?
And about loading weights, I meant loading a kernel during the inference from the weights memory to a processor.
Check the processor and mask enable bits in the generated code. Hard to give a definitive answer without seeing the code (please email if needed)
https://github.com/MaximIntegratedAI/ai8x-synthesis/blob/develop/izer/backend/max7800x.py#L1673
According to the code, the actual value written to the register is 1/4 (out_offset (from YAML) + 0x8000 smallest_out_processor_group_index).
(Below is a copy of documentation from the same file.)