SMAT-Lab / Scalpel

Scalpel: The Python Static Analysis Framework
Apache License 2.0
279 stars 42 forks source link

Constant propagation for loop variables #27

Closed simisimon closed 2 years ago

simisimon commented 2 years ago

Considering the following code.

from sklearn.cluster import KMeans

for i in range(5):
    kmeans = KMeans(n_clusters=i)

I want to compute all possible value for the loop variable i. Using scalpel results into the following const_dict:

{('i', 0): None, ('kmeans', 0): <ast.Call object at 0x0000015FB8F6EC10>}

Is this behavior intentional? I would not expect i to be None. Instead, I would expect the const_dict to contain all the values i can have in the loop.

Jarvx commented 2 years ago

Hi,

Thank you for the question. It is indeed very interesting as the iterator in the for-loop header can be various types of values such as any containers. We will push a new patch very soon to make it linked to the generator variable first. Thanks for raising this issue.

simisimon commented 2 years ago

Thank you. I'm looking forward to the patch. I really like your tool, I'm using it in one of my projects. :-)

Jarvx commented 2 years ago

Thanks for your kind words. It's great to know this!

simisimon commented 2 years ago

Hi, any updates when this problem will be fixed?

Jarvx commented 2 years ago

Hi, any updates when this problem will be fixed?

Thanks for the reminder. I am testing the feature now. Will push tomorrow at some time.

Jarvx commented 2 years ago

Now, I added support for the for-loop target, whose value is rewritten as the return value from next() of the iterator (e.g. next(iter(range(10)))). We will update the documentation next. Please feel free to reopen the issue if you see any runtime executions. Of course, any test cases are really appreciated.

simisimon commented 2 years ago

I looked into your provided fix and I do not know whether this should be the solution for this problem. https://github.com/SMAT-Lab/Scalpel/blob/2ba6cab3c7d613ed6e2dfe936f98b3b759a2347c/scalpel/SSA/const.py#L235-L242 The current solution creates two artifical ast.Call objects. I think this unnecessarily complicates constant propagation for loop variables. Instead, we should use the variable iter_value (which is a dead variable now), as this variable already contains the corresponding ast.Call object of the loop. Regarding the example above, iter_value would be the call of range(5).

I would like to create an PR, but I do not have the permission. Therefore, I provide my suggested fix here:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     iter_value = stmt.iter 
     const_dict[left_name] = iter_value

or even better:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     const_dict[left_name] = stmt.iter
simisimon commented 2 years ago

Corresponding test cases could look like:

def test_in_range_loop():
    code_str = """for i in range(3):    x = i"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)

    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('i', 0)), ast.Call) 
    assert isinstance(const_dict.get(('x', 0)), ast.Name)

def test_for_each_loop():
    code_str = """for x in y:    z = x"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)

    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('x', 0)), ast.Call) 
    assert isinstance(const_dict.get(('z', 0)), ast.Name)