Closed MoIbs-tech closed 4 years ago
To add to the above comment, when I change the gap_ncore() between 1 and 8, I do get a lower value of PERF_LD when gap_ncore() = 1 for 1 core whose value is successfully printed while the rest of the cores as mentioned shows 0 value as shown below:
For gap_ncore() = 1: ` Initialize Performace Counters Call cluster Running on cluster Runner completed
[0 0] PERF_LD_STALL : 11681583
[0 1] PERF_LD_STALL : 0
[0 2] PERF_LD_STALL : 0
[0 3] PERF_LD_STALL : 0
[0 4] PERF_LD_STALL : 0
[0 5] PERF_LD_STALL : 0
[0 6] PERF_LD_STALL : 0
[0 7] PERF_LD_STALL : 0
For gap_ncore = 8:
Initialize Performace Counters
Call cluster
Running on cluster
Runner completed
[0 0] PERF_LD_STALL : 1474391 [0 1] PERF_LD_STALL : 0 [0 2] PERF_LD_STALL : 0 [0 3] PERF_LD_STALL : 0 [0 4] PERF_LD_STALL : 0 [0 5] PERF_LD_STALL : 0 [0 6] PERF_LD_STALL : 0 [0 7] PERF_LD_STALL : 0
` note: the LD_STALL print is an outdated print, the above value is for PERF_LD i,e. the number of ld instructions executed.
The problem is that RunNetwork is only executed by core 0. You have to do a fork so that all cores execute the same function for starting the performance counter and also for measuring it.
I was hoping to measure for each core when funcition is parallelized across all cores rather than run the whole function for each core, but Im guessing thats not possible and not really a valid idea of measuring performance counters for a mutic-core processing scenario?
From: haugoug notifications@github.com Sent: 03 August 2020 20:02 To: GreenWaves-Technologies/gap_sdk gap_sdk@noreply.github.com Cc: Mohammed Ibrahim tp19021@bristol.ac.uk; Author author@noreply.github.com Subject: Re: [GreenWaves-Technologies/gap_sdk] Performance counters for each core during multicore processing (#157)
The problem is that RunNetwork is only executed by core 0. You have to do a fork so that all cores execute the same function for starting the performance counter and also for measuring it.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/GreenWaves-Technologies/gap_sdk/issues/157#issuecomment-668190037, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANO5Y4ALOQT2R26PE4TZ7ZTR64CVXANCNFSM4PSQZIAQ.
The fork is in the runner __PREFIX(CNN)(Input_1, Output_1);
If you change gap_ncore() you will change the amount of cores the fork occurs on.
On Tue, Aug 4, 2020 at 12:23 AM MoIbs notifications@github.com wrote:
I was hoping to measure for each core when funcition is parallelized across all cores rather than run the whole function for each core, but Im guessing thats not possible and not really a valid idea of measuring performance counters for a mutic-core processing scenario?
From: haugoug notifications@github.com Sent: 03 August 2020 20:02 To: GreenWaves-Technologies/gap_sdk gap_sdk@noreply.github.com Cc: Mohammed Ibrahim tp19021@bristol.ac.uk; Author < author@noreply.github.com> Subject: Re: [GreenWaves-Technologies/gap_sdk] Performance counters for each core during multicore processing (#157)
The problem is that RunNetwork is only executed by core 0. You have to do a fork so that all cores execute the same function for starting the performance counter and also for measuring it.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/GreenWaves-Technologies/gap_sdk/issues/157#issuecomment-668190037>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ANO5Y4ALOQT2R26PE4TZ7ZTR64CVXANCNFSM4PSQZIAQ
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GreenWaves-Technologies/gap_sdk/issues/157#issuecomment-668270647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQZY5TQFDJLNXGZNAPDRL3R642HRANCNFSM4PSQZIAQ .
I mean you need to init, start and read the counters from each core. To do it, you can do a fork to execute a function on all cores that will init and start the counters on each core and another fork with another function to get the counter value
I will close this issue, please feel free to reopen it if you have further question related. Thanks.
I'm trying to see if I can measure the performance counters (PCs) for each core while processing a CNN model thats generated using nntool. I set the no.of cores in gap_ncore() and the performance improvement works. But when I try to measure PCs like PERF_LD inorder to view the concurrent processing of ld instructions, I am getting this problem where for 1 core I am getting a value and for the rest of the 7 out of 8 active cores are zero. The PC measurement code was based on the this example: https://github.com/GreenWaves-Technologies/gap_sdk/blob/master/examples/pmsis/test_periph/perf/perf.c
and this is my main application for running the cnn and measruing the PCs to view any bugs in it:
`
include "pmsis.h"
/ RT for GPIO settings /
include "rt/rt_api.h"
include "Gap.h"
/ Autotiler includes. /
include "cifar10Kernels.h"
include "gaplib/ImgIO.h"
/ To include voltage, FC_FREQ, CL_FREQ and GPIO def /
include "/home/tp19021/gap_sdk/gap_volt_fc_cl_header.h"
define AT_INPUT_WIDTH 32
define AT_INPUT_HEIGHT 32
define AT_INPUT_COLORS 3
define AT_INPUT_SIZE (AT_INPUT_WIDTHAT_INPUT_HEIGHTAT_INPUT_COLORS)
typedef signed char IMAGE_IN_T;
// #define AT_INPUT_SIZE 64643 signed char Input_1[AT_INPUT_SIZE] = {0};
signed short int Output_1[10];
// #ifndef STACK_SIZE // values are from CLUSTER_STACK_SIZE AND CLUSTER_SLAVE_STACK_SIZE // from makefile in visual_wake
ifndef STACK_SIZE
define STACK_SIZE 2048
endif
// #endif
// #define PERF
undef PERF
AT_HYPERFLASH_FS_EXT_ADDR_TYPE cifar10_L3_Flash = 0;
// structure for performance counters // rt_perf_t *perf; // uint32_t perf_ld; PI_L2 uint32_t perf_values[ARCHI_CLUSTER_NB_PE] = {0};
static void RunNetwork(void *arg) { pi_perf_conf(1<<RT_PERF_LD);
}
int start() { printf("Entering main controller\n");
}
int main(void) { set_gap8_state(); // my function for setting input voltage, CL FREQ and FC FREQ printf(" NNTOOL cifar10 Deeper application \n"); return pmsis_kickoff((void *) start); } `