KULeuven-MICAS / snax_cluster

A heterogeneous accelerator-centric compute cluster
Apache License 2.0
11 stars 9 forks source link

program errors #378

Closed jjl1075337132 closed 1 month ago

jjl1075337132 commented 1 month ago

1 (2).txt I configured 4 cores + snax-alu and then quadrupled the size of the original data, then I used one core + snax-alu to process one copy of the data. I found that if(snrt_cluster_core_idx() == 2 ){}else if(snrt_cluster_core_idx() == 3) vs. if(snrt_cluster_core_idx() == 2 ){} if(snrt_cluster_core_idx() == 3) output is different, the former only printf : core 0 Accelerator Done! Accelerator Cycles: 26 Accelerator Cycles: 26 Accelerator Cycles: 26 Number of errors: 0 Number of errors: 0 Accelerator Cycles: 26 Accelerator Cycles: 26 Accelerator Cycles: 26 Accelerator Cycles: 26 The latter will output

missing two printf outputs: core 0 Accelerator Done! core 1 Accelerator Done!

jjl1075337132 commented 1 month ago
4core
rgantonio commented 1 month ago

Hi @jjl1075337132 there are many things to consider in here.

(1) The printf running in each core in parallel causes contentions! The way the printf works is that each core writes to one part of the memory. Now if all accelerators try to write something, not all printf will happen that's why core 0 and 1 are missing.

I suggest to use only 1 core to do the printing.

(2) The accelerator cycles can be different due to memory bank contentions. The ALU stalls whenever there are bank contentions. By contentions meaning, more than one accelerator accesses the same bank twice. For example, accessing data at address 0 and address 256 access the same bank. If this happens, one accelerator gets the data first then the other one next.

Make sure to avoid bank contentions!

jjl1075337132 commented 1 month ago

Thank you for your answer, I understand what you mean, but I feel that using one core to print whether the execution of other cores is correct and the time spent by the accelerators of other cores seems to be impossible. I don't know if you have any ideas?

rgantonio commented 1 month ago

Hi @jjl1075337132

Technically speaking, you wouldn't want to do printing in the first place. We print only to debug our systems.

The only way to know if accelerators process the correct data is to not increment err variable.

If you notice from our examples, the err variable was meant as an error signal. If an error occurs (e.g., incorrect value or some incorrect sequence) we increment err.

Then we return err that is not equal to 0. This way we can check if an accelerator is correct or not.

The problem with printf really, is more like it was just meant for debugging purpose. It's in fact, not a good idea to put printf into our code if we measure performance.

If you follow the traces later (e.g., look at waveforms) you will see that the printf takes sooooo many cycles. So it's not meant to be there in the first place.

I hope you understand what I mean.

jjl1075337132 commented 1 month ago

I'm grateful for your help.

rgantonio commented 1 month ago

I will close this issue now 😄 feel free to ask more questions in another issue! Thanks for your interest! @jjl1075337132