ctuning / reproduce-milepost-project

Collective Knowledge workflow for the MILEPOST GCC (machine learning based compiler). See how it is used in the collaborative project with the Raspberry Pi foundation to support collaborative research for multi-objective autotuning and machine learning techniques, and prototype reproducible papers with portable workflows:
http://cKnowledge.org/rpi-crowd-tuning
GNU General Public License v2.0
47 stars 6 forks source link

Extracting static features programmatically #11

Closed hrshtv closed 3 years ago

hrshtv commented 3 years ago

This link has a nice interface for extracting all MILEPOST static program features with the click of a button. Can we do the same thing programmatically in python? I'm looking for something along the following lines:

path = "example.c"
features = milepost.extract(path) # This would be a dict/list of the extracted features

Is something like this possible?

gfursin commented 3 years ago

Hi @hrshtv,

Thank you for your interest!

At this moment, there is no OO class for Milepost. However, I plan to gradually convert CK to be more pythonic in 2021.

In the meantime, you can extract MILEPOST features for a given CK program as follows:

import ck.kernel as ck

r=ck.access({'action':'extract', 'module_uoa':'program.static.features', 'data_uoa':'cbench-automotive-susan'})
if r['return']>0: ck.err(r)

features=r.get('dict',{}).get('features',{})

You need to have MILEPOST GCC installed via CK.

If you want to extract features from an arbitrary source code, just copy paste some CK program to a dummy CK program, add your source code and add it to CK meta, something as follows:

ck cp program:cbench-automotive-susan program:my-dummy-program

ck find program:my-dummy-program

# Add source code there; and add its name to .cm/meta.json 

ck extract program.static.features:my-dummy-program

If it sounds useful, I can provide more explanations ...

Also, @ChrisCummins is working on a related infrastructure and he mentioned that he plans to release it soon - they are using cool deep learning techniques to learn optimization heuristics and you may be interested to follow their projects too!

hrshtv commented 3 years ago

Thanks for the explanation! Is there any documentation that explains the arguments of the functions used? For example, ck.access({...})

gfursin commented 3 years ago

Some limited description is available at https://ck.readthedocs.io/en/latest/src/ck.html#ck.kernel.access .

This function always takes dict as input with

You can find the input keys and the output dictionary for a given module and action from the cmd as follows:

ck extract program.static.features --help

UOA is an abbreviation for CK UID or alias, i.e. you can use both the user friendly name such as "program.static.features" or it's internal UID (92a02f0445148203)

My hope/goal is to update all help pages for major APIs in 2021 ...

ChrisCummins commented 3 years ago

Hi @hrshtv, I'm following up here at Grigori's request with something that might be of interest to you. We just launched CompilerGym, a research platform for compiler autotuning. In particular, it exposes a handful of different program representations through a simple python interface.

For LLVM, we have a variety of different program representations, though not milepost (I'll look seeing how much work it would take to add).

The general usage would be:

  1. Compile your program to LLVM-IR:
$ clang-10 -emit-llvm -c myapp.cc
  1. In Python, create an LLVM environment to load your program, then print different observation spaces using:
>>> import gym
>>> import compiler_gym
>>> from compiler_gym.service.proto import Benchmark, File
# load the LLVM-IR file:
>>> path = "/path/to/myapp.bc"
>>> benchmark = Benchmark(uri=f"file:///{path}", program=File(uri=f"file:///{path}"))
# create a compiler session:
>>> env = gym.make("llvm-v0")
>>> env.reset(benchmark)
>>> env.observation["Programl"]
<networkx.classes.multidigraph.MultiDiGraph object at 0x7f9d8050ffa0>
>>> env.observation["Inst2vec"]
array([[-0.26956588,  0.47407162, -0.36637706, ..., -0.49256894,
         0.8016193 ,  0.71160674],
       [-0.59749085,  0.63315004, -0.0308373 , ...,  0.14833118,
         0.86420786,  0.44808227],
       [-0.59749085,  0.63315004, -0.0308373 , ...,  0.14833118,
         0.86420786,  0.44808227],
       ...,
       [-0.37584195,  0.43671703, -0.5360456 , ...,  0.6030259 ,
         0.82574934,  0.6306344 ],
       [-0.59749085,  0.63315004, -0.0308373 , ...,  0.14833118,
         0.86420786,  0.44808227],
       [-0.43074277,  0.8589559 , -0.35770646, ...,  0.28785184,
         0.8492773 ,  0.8914213 ]], dtype=float32)
>>> env.observation["Autophase"]
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0])

where ProGraML and [inst2vec]() are two recent state-of-the-art deep learning representations.

Cheers, Chris

Edit: typos, see question below

gfursin commented 3 years ago

Hey Chris,

Thanks for sharing - looks really cool!

I got stuck with the above example on the following line:

env.reset(benchmark="file:////home/gfursin/work/susan.bc")

ValueError: Unknown benchmark "file:////home/gfursin/work/susan.bc"

The example at https://github.com/facebookresearch/CompilerGym worked fine:

...
; Function Attrs: nounwind
declare i32 @sprintf(i8*, i8*, ...) #3

; Function Attrs: nounwind
declare double @pow(double, double) #3

attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }
attributes #2 = { "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { nounwind "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #4 = { nounwind }

[  0   0   7   3   4   7   6   4   1   6   0   0   0  14   0  13  22   5
  19  34   5  12  23   7   2   0   2  21   0   2  12   0  13  23   7   6
   0  32   0   0   0   1   7   0   0  23   0   0   0   0  14 136 106   5
   0  61]
...

Will dig further into your project during vacations.

Thanks again for the update!!! Grigori

gfursin commented 3 years ago

I moved this question here: https://github.com/facebookresearch/CompilerGym/issues/12 .