boegel / MICA

a Pin tool for collecting microarchitecture-independent workload characteristics
http://users.ugent.be/~kehoste/ELIS/MICA
Other
59 stars 34 forks source link

tableGen.sh generates broken output data #12

Closed twang15 closed 3 years ago

twang15 commented 6 years ago

MICA uses tableGen.sh to facilitate the processing of collected output data, by adding headers and concatenating all data.

However, tableGen.sh currently cannot properly achieve this goal.

Steps to reproduce:

  1. Install mica 1.0 (revision: 95162185233a04) with intel pin 3.7, following instruction in #9

  2. put the following content in mica.conf

analysis_type: all interval_size: full page_size: 12 block_size: 6 itypes_spec_file: itypes_default.spec append_pid: yes

  1. pin -t mica.so -- ls -l This generates 7 .out files as following (10257 is pid, may be different for a reproduction of this bug, but all the .out files should have the same pid)

ilp_full_int_10257_pin.out memfootprint_full_int_10257_pin.out ppm_full_int_10257_pin.out stride_full_int_10257_pin.out itypes_full_int_10257_pin.out memstackdist_full_int_10257_pin.out reg_full_int_10257_pin.out

  1. mkdir tmp

  2. move *.out tmp/

  3. modify tableGen.sh, line 8 before: benchmarks=* after: benchmarks=(tmp)

  4. ./tableGen.sh

  5. cat micaTable.txt You will see the following content.

APPLICATION_NAME DATASET totInstruction ILP32 ILP64 ILP128 ILP256 total_ins_count_for_hpc_alignment totInstruction mem-read mem-write control-flow arithmetic floating-point stack shift string sse other nop InstrFootprint64 InstrFootprint4k DataFootprint64 DataFootprint4k mem_access memReuseDist0-2 memReuseDist2-4 memReuseDist4-8 memReuseDist8-16 memReuseDist16-32 memReuseDist32-64 memReuseDist64-128 memReuseDist128-256 memReuseDist256-512 memReuseDist512-1k memReuseDist1k-2k memReuseDist2k-4k memReuseDist4k-8k memReuseDist8k-16k memReuseDist16k-32k memReuseDist32k-64k memReuseDist64k-128k memReuseDist128k-256k memReuseDist256k-512k memReuseDist512k-00 GAg_mispred_cnt_4bits PAg_mispred_cnt_4bits GAs_mispred_cnt_4bits PAs_mispred_cnt_4bits GAg_mispred_cnt_8bits PAg_mispred_cnt_8bits GAs_mispred_cnt_8bits PAs_mispred_cnt_8bits GAg_mispred_cnt_12bits PAg_mispred_cnt_12bits GAs_mispred_cnt_12bits PAs_mispred_cnt_12bits total_brCount total_transactionCount total_takenCount total_num_ops instr_reg_cnt total_reg_use_cnt total_reg_age reg_age_cnt_1 reg_age_cnt_2 reg_age_cnt_4 reg_age_cnt_8 reg_age_cnt_16 reg_age_cnt_32 reg_age_cnt_64 mem_read_cnt mem_read_local_stride_0 mem_read_local_stride_8 mem_read_local_stride_64 mem_read_local_stride_512 mem_read_local_stride_4096 mem_read_local_stride_32768 mem_read_local_stride_262144 mem_read_global_stride_0 mem_read_global_stride_8 mem_read_global_stride_64 mem_read_global_stride_512 mem_read_global_stride_4096 mem_read_global_stride_32768 mem_read_global_stride_262144 mem_write_cnt mem_write_local_stride_0 mem_write_local_stride_8 mem_write_local_stride_64 mem_write_local_stride_512 mem_write_local_stride_4096 mem_write_local_stride_32768 mem_write_local_stride_262144 mem_write_global_stride_0 mem_write_global_stride_8 mem_write_global_stride_64 mem_write_global_stride_512 mem_write_global_stride_4096 mem_write_global_stride_32768 mem_write_global_stride_262144 \ntmp dataset1 1525767 271452 231588 210573 198612number of instructions: 15071931525767 378669 208808 331972 966639 0 91047 21475 19038 17312 66021 7025 5234number of instructions: 15071934867 195 2578 192number of instructions: 1507193379653 4383 193346 30533 32844 38680 26723 14882 19701 13442 2415 874 1214 606 10 0 0 0 0 0 0number of instructions: 15071931525767 39151 15615 17974 24520 63385 17376 17145 20734 75640 20621 17759 18318 260408 23047 20571number of instructions: 15071931525767 2642826 921452 1591809 1689083 391893 520184 704763 923623 1108784 1264166 1378887number of instructions: 1507193379201 16724 249974 297173 329930 344643 349521 350575 3400 95408 152605 179633 191666 223358 225200 208808 4846 155718 173677 198735 201194 202787 203046 232 50906 157753 187770 191315 191937 191987number of instructions: 1507193

twang15 commented 6 years ago

Here is the fix.

`

!/bin/bash

# Amir H. Ashouri - 2017 # (www.eecg.toronto.edu/~aashouri/) # This script looks for all MICA output files corresponds to a pid and generates a MICA table. The first row is the header and is added as well. # Tested with MICA v0.40

benchmarks=(ls)

echo "APPLICATION_NAME DATASET totInstruction ILP32 ILP64 ILP128 ILP256 total_ins_count_for_hpc_alignment totInstruction mem-read mem-write control-flow arithmetic floating-point stack shift string sse other nop InstrFootprint64 InstrFootprint4k DataFootprint64 DataFootprint4k mem_access memReuseDist0-2 memReuseDist2-4 memReuseDist4-8 memReuseDist8-16 memReuseDist16-32 memReuseDist32-64 memReuseDist64-128 memReuseDist128-256 memReuseDist256-512 memReuseDist512-1k memReuseDist1k-2k memReuseDist2k-4k memReuseDist4k-8k memReuseDist8k-16k memReuseDist16k-32k memReuseDist32k-64k memReuseDist64k-128k memReuseDist128k-256k memReuseDist256k-512k memReuseDist512k-00 GAg_mispred_cnt_4bits PAg_mispred_cnt_4bits GAs_mispred_cnt_4bits PAs_mispred_cnt_4bits GAg_mispred_cnt_8bits PAg_mispred_cnt_8bits GAs_mispred_cnt_8bits PAs_mispred_cnt_8bits GAg_mispred_cnt_12bits PAg_mispred_cnt_12bits GAs_mispred_cnt_12bits PAs_mispred_cnt_12bits total_brCount total_transactionCount total_takenCount total_num_ops instr_reg_cnt total_reg_use_cnt total_reg_age reg_age_cnt_1 reg_age_cnt_2 reg_age_cnt_4 reg_age_cnt_8 reg_age_cnt_16 reg_age_cnt_32 reg_age_cnt_64 mem_read_cnt mem_read_local_stride_0 mem_read_local_stride_8 mem_read_local_stride_64 mem_read_local_stride_512 mem_read_local_stride_4096 mem_read_local_stride_32768 mem_read_local_stride_262144 mem_read_global_stride_0 mem_read_global_stride_8 mem_read_global_stride_64 mem_read_global_stride_512 mem_read_global_stride_4096 mem_read_global_stride_32768 mem_read_global_stride_262144 mem_write_cnt mem_write_local_stride_0 mem_write_local_stride_8 mem_write_local_stride_64 mem_write_local_stride_512 mem_write_local_stride_4096 mem_write_local_stride_32768 mem_write_local_stride_262144 mem_write_global_stride_0 mem_write_global_stride_8 mem_write_global_stride_64 mem_write_global_stride_512 mem_write_global_stride_4096 mem_write_global_stride_32768 mem_write_global_stride_262144" > micaTable.txt

for i in $benchmarks do printf "$benchmarks"

if [ -d "$i" ] then tmp=$PWD cd $i

process directory

echo "**********************************************************"
echo $i
j_pid=1
pidList=$(ls * |grep ilp_full_int_ |sed 's/ilp_full_int_//' |sed 's/_pin.out/ /' | tr -d "\n")
for i_pid in $pidList
do
  echo -n "$i dataset$j_pid " >> ../micaTable.txt
  output=(`ls *.out*`)
  for f in ${output[@]}
  do
    head -1 $f  >> t
  done

  #merge t into 1 line
  sed ':a;N;$!ba;s/\n/ /g' t >> ../micaTable.txt
  j_pid=$(($j_pid+1))
done
echo ""
echo ""
# *************************

cd $tmp

fi

done `

twang15 commented 6 years ago

This also solves the problem #11

twang15 commented 6 years ago

The problem is actually partially solved.

  1. With the following content in mica.conf, MICA generates 101 features in 7 files

analysis_type: all interval_size: full itypes_spec_file: itypes_default.spec append_pid: yes

  1. The header in tableGen.sh only expects 99 features. Although it has 101 fields, the first two are application name and data set id, respectively.

  2. The above solution cannot solve the inconsistency between 1 and 2.

In the end, I just generate new headers with m1~m101 for 101 features.

twang15 commented 6 years ago

Here are #features in each file ( #features file_name)

5 ilp_full_int_10257_pin.out 13 itypes_full_int_10257_pin.out 4 memfootprint_full_int_10257_pin.out 21 memstackdist_full_int_10257_pin.out 16 ppm_full_int_10257_pin.out 12 reg_full_int_10257_pin.out 11 stride_full_int_10257_pin.out

In total, there are 101 features while tableGen.sh expects only 99.

stefanocereda commented 5 years ago

Hi, I'm experiencing a similar problem. Did you find a solution? Here is what I have obtained so far:

Naming them m1~m101, unfortunately, is not an option for me.

boegel commented 5 years ago

@amirjamez Any ideas on this?

amirjamez commented 5 years ago

@boegel, commit 4a2b5f1d8a2c2934486ca22623c50c9ae98e67b1 should fix the issue. Now, we should have 99 MICA values (using the table generator plus the two APPLICATION_NAME DATASET). @stefanocereda, let me know if that worked

stefanocereda commented 5 years ago

Hi @amirjamez , thanks for the answer, but, unfortunately, it doesn't work. I still have 100 metrics (plus APPLICATION_NAME and DATASET).

Here is what I'm doing:

There are 2+99 headers and 2+100 metrics, so something is wrong Moreover, the 3rd column (totInstruction) should be equal to the 9th (totInstruction), instead it is equal to the 8th (total_ins_count_for_hpc_alignment)

amirjamez commented 5 years ago

@stefanocereda I removed the last duplicate output and now it should print 99 outputs. Also, note that totInstruction and total_ins_count_for_hpc_alignment are roughly the same. You can also customize the way MICA prints its output by modifying each mica_<TYPE>.cpp file, e.g., reformat the .csv delimiter, reordering, removing, etc. See my last 2 commits and you see it would be pretty easy. Good Luck!

stefanocereda commented 5 years ago

Perfect, I have included your latest commits and now it is working great! Thanks!