esa-tu-darmstadt / tapasco

The Task Parallel System Composer (TaPaSCo)
GNU Lesser General Public License v3.0
106 stars 25 forks source link

Error during System Composition when #PE near 128 #355

Open wirthjohannes opened 1 year ago

wirthjohannes commented 1 year ago

The System Composition fails when the number of PEs is near 128 with the following error related to the content of the status core:

ERROR: [IP_Flow 19-3478] Validation failed for parameter 'Coe File(Coe_File)' with value '../../../../../../statuscore.coe' for BD Cell 'tapasco/tapasco_status_base'. The Memory Initialization vector can contain between 1 to Write Depth A number of entires.
ERROR: [Common 17-39] 'set_property' failed due to earlier errors.

    while executing
"rdi::add_properties -dict {CONFIG.Memory_Type Single_Port_ROM CONFIG.Load_Init_File true CONFIG.EN_SAFETY_CKT false CONFIG.Coe_File ..."

Reproduce: tapasco compose [Counter x 128] @ 100MHz -p zedboard (import Counter.zip from toolflow/examples/ beforehand) From my tests this occurs on all platforms (and also in different Vivado versions). The exact limit seems to be slightly different for the various plaforms (around 120 PEs from my tests).

tsmk94 commented 1 year ago

This is probably related to the BRAM of the status core being too small to hold the names of all PEs. Shortening the name of the PE should solve this.

wirthjohannes commented 1 year ago

Then we should probably catch this with a more user-friendly error message (currently there is already one for > 128 PEs).

wirthjohannes commented 1 year ago

If this really depends on the length of the PE-name this might be harder. However there also seems to be some other factor (because the number slightly differs between platforms for the same PE)

tsmk94 commented 1 year ago

There is also other platform-dependent information encoded in the status core, e.g. DMA engine and DMA interrupts are only used on PCIe device. This may reduce the space left for listing PEs.

c-93 commented 1 year ago

Problem as far as I understand: We store the PE string. BRAM consumption depends on string length and thenumber of strings. Correct?

Solution idea: Instead of storing PE name string, we could hash the PE name. This would be a software-only patch. Hash digest strings would be constant in length.

When using sufficient bits, hash collisions should be minimal. We could detect collisions and also print a warning, or error when two different PE names would yield identical hashes.

Caveat: Getting plain PE-names from hardware, and iterating over PE-names will not work anymore - we will just get the hashes.

Other ideas: Increase BRAM size? Shorten or even limit string length.

wirthjohannes commented 1 year ago

I think a possible solution could be two-fold:

1) Better error message: It should be possible to properly catch this case (e.g. in the json_to_status program) and print a helpful error message (maybe with some suggestions).

2) Status Core Size: The major issue is that we store the VLNV for each PE instance in the status core. So if you have 128 instances of a single PE, the VLNV is stored 128 times. So we could use a separate map "Kernel ID -> VLNV" in the status core, meaning we would need to store each VLNV only once (this of course won't help in a case with 128 different PEs, but I think this is very unrealistic). Other options would of course be some of the suggestions by @c-93.