egallesio / STklos

STklos Scheme
http://stklos.net
GNU General Public License v2.0
69 stars 17 forks source link

Prepare the VM to profile itself with times and frequencies for all instructions - output in both Scheme and CSV formats. #530

Closed jpellegrini closed 1 month ago

jpellegrini commented 1 year ago

This recovers an old profiling functionality that used to be in vm.c. The previous version would count the frequencies of instructions. The new code also measures the time taken by each instruction.

Instructions for using the profiler are added to vm.adoc. It is possible to dump the data to a CSV file, open in a spreadsheet and sort by total time taken by each instruction, or by average time per instruction...

CAVEAT: This only works for the USE_COMPUTED_GOTO compilation. For some reason, the standard code (without computed gotos) does not seem to be working (doesn't compile), even without this patch. See PR #528

A description follows.

The VM will, at the end of each iteration, check the instruction that was just executed and update:

This is done in the tick() C function (which is compiled conditionally on STAT_VM):

void tick(STk_instr b) {
  couple_instr[previous_op][b]++;
  cpt_inst[b]++;
  previous_op = b;

  current_time = clock();
  if (previous_time > 0)
      time_inst[b] += ((double)(current_time - previous_time)) / CLOCKS_PER_SEC;
  previous_time = current_time;
}

Two Scheme primitives are then available:

Here's the header from the Scheme version of a statistics dump:

;; STklos VM statistics. It can be read in Scheme, and it represents one single
;; object: an alist with the names of instructions, and each CDR is a list containing:
;;
;; * count (the number of times this instruction was executed)
;; * time (the total time the program spent on this instruction)
;; * avg time (the average number of time spent on each execution of this instruction)
;; * an alist containing, for each OTHER instruction, the number of times they appeared\n"
;;   together in the code.
;;
;; ( (INS1 count1 time1 avgtime1 ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn)))
;;   (INS2 count2 time2 avgtime2 ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn)))
;;   ...
;;   (INS2 countk timek avgtimek ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn))) )

The C code for printing the instructions is in the functions dump_couple_instr_csv and dump_couple_instr_scm.

jpellegrini commented 1 year ago

I added a profiling macro...

(%with-profile-data "file-name.csv" :csv
  (display a)
  (newline)
  (values 1 2 3))

This will write the profiling data to file-name.csv, and will return the same values that the code would (1, 2, 3). The expansion is

(receive
  result-4
  (begin (%vm-reset-stats)
         (display a)
         (newline)
         (values 1 2 3))
  (begin (%vm-dump-stats "file-name.csv" #:csv)
         (apply values result-4)))

Also, since this macro needs to know if the stats code is available, we also include a primitive, (%vm-has-stats).

When the stats code is not compiled in, the macro just expands to (begin ...code...).

It is still marked with %, but I suppose this could be useful for the general user in the future?

jpellegrini commented 1 year ago

And just for completness...

Collecting statistics is off by default, especially because compiling STklos is very slow with statistics gathering. It should be turned on before profiling.

The status of "collecting" or "not collecting" reflects what happens internally in the statistics gathering code, when it is compiled in. If STklos was compiled without STAT_VM, then those procedures are just not available at all.

jpellegrini commented 1 year ago

It only measures the times for VM instructions. However, for example...I have a benchmark in which UGREF_INVOKE takes a lot of time. But it's not, itself, slow. It happens to send the control flow to FUNCALL, which I think is the slow part of it. So maybe we could also measure, besides instructions, some specific parts of the VM (like FUNCALL?)

jpellegrini commented 1 year ago

I have written tick() as a nested function, but that's a GNU extension. I'll fix it

jpellegrini commented 1 year ago

@egallesio I think I have the statistics collection working (for the computed goto version), but I have one question...

These are declared static in vm.c:

static int couple_instr[NB_VM_INSTR][NB_VM_INSTR];
static int cpt_inst[NB_VM_INSTR];
static double time_inst[NB_VM_INSTR];

but then, wouldn't several threads clobber the data being collected by each other? (For example, in the couple_instr matrix one thread would overwrite the previous instruction from the other, and the data wouldn't correspond to wha tactually happens) -- right? Should we move these to inside run_vm()? Or something else?

jpellegrini commented 1 year ago

Hi @egallesio ! If #561 is merged, I'll update this one (which will be necessary, I think).

jpellegrini commented 1 year ago

I'm making adjustments to this one, as it would currently do the wrong thing, depending on the use of computed gotos.

jpellegrini commented 1 year ago

OK @egallesio I think this one is ready now! :)

egallesio commented 1 month ago

After 15 months, I have finally merged this great PR :smile:

I have applied small modifications (replaced the 3 functions around getting and setting the stats or not) by a unique parameter. I have also suppressed the primitive (%vm-has-stats) since it can be easily determined from the new version of (%vm-config) in PR #560

Finally, macro %with-profile-data is only defined when STklos is compiled with the STAT_VM flag set.