SMAT-Lab / Scalpel

Scalpel: The Python Static Analysis Framework
Apache License 2.0
278 stars 42 forks source link

Fix for: Constant propagation for loop variables #37

Closed simisimon closed 2 years ago

simisimon commented 2 years ago

As I already commented in #27, I think that the current solution unnecessarily complicates constant propagation for loop variables, since two artifical ast.Call objects (Line 239 and 241) are created, while the actual ast.Call object (Line 237) of the loop is not used at all.

https://github.com/SMAT-Lab/Scalpel/blob/2ba6cab3c7d613ed6e2dfe936f98b3b759a2347c/scalpel/SSA/const.py#L235-L242

I would like to create a PR, but I do not have the permission. Therefore, I provide my suggested fix here:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     iter_value = stmt.iter 
     const_dict[left_name] = iter_value

or even better:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     const_dict[left_name] = stmt.iter

Corresponding test cases can look like this:

def test_in_range_loop():
    code_str = """for i in range(3):    x = i"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)

    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('i', 0)), ast.Call) 
    assert isinstance(const_dict.get(('x', 0)), ast.Name)

def test_for_each_loop():
    code_str = """for x in y:    z = x"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)

    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('x', 0)), ast.Call) 
    assert isinstance(const_dict.get(('z', 0)), ast.Name)
Jarvx commented 2 years ago

We really appreciate your idea and the test cases. But the problem is in the context of a for-loop statement, we can not bind the iterator name to the target variable as the "in" is not equivalent to assignment .

Could you please provide some references that your fix is semantically and syntacally correct? If this is your intuitive idea, we will discuss this later and then decide.

PR will be allowed soon.

simisimon commented 2 years ago

Well, I'm not sure if I understand the problem correctly, because I do not know what you mean with "we can not bind the iterator name to the target variable as the "in" is not equivalent to assignment".

To highlight the differences between both variants, let's have a look into the corresponding ast objects. The former one refers to node created by your fix and the latter one refers to my suggestion.

next_call_node:  Call(func=Name(id='next', ctx=Load()), args=[Call(func=Name(id='iter', ctx=Load()), args=[Call(func=Name(id='range', ctx=Load()), args=[Constant(value=5)], keywords=[])], keywords=[])], keywords=[])
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=5)], keywords=[])

As we can see, we already have the correct ast object when using the iter_value. In my opinion it makes no sense to additionally create two ast objects. This only complicates the entire constant propagation.

To check whether my suggestions works for all in range() variants, I printed out the iter_value.

(1)
for i in range(5):
    print(i)

(2)
for c in range(2, 6):
    print(c)

(3)
for d in range(2, 30, 3):
    print(d)
(1)
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=5)], keywords=[])
(2)
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=2), Constant(value=6)], keywords=[])
(3)
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=2), Constant(value=30), Constant(value=3)], keywords=[])

In all cases, iter_value already contains the correct ast object. Therefore, it is not necessary to additionally create two artificial ast objects.

Jarvx commented 2 years ago

Thanks for the examples. I agree that my implementation of two nested function calls is not a concise way to fix it. However, the actual problem here is, for the code snippet:

for i in range(5):
    a = i*2 

If we want to evaluate the right side assignment for the variable a, replacing the variable i (I mean folding the expression) with an iter object in your case seems not workable and semantically correct because the call itself simply return a range object. In the runtime, the variable i is should be an element from the generator.

The actual purpose of this module is to bind the variable definition and its usage.

What is your opinion for this case?

simisimon commented 2 years ago

I think this is a special case, as we have here a variable a that relies on another variable i. However, I do not think that such cases should be handled by your framework, because the constant propagation is designed to return a dictionary of values for all identified variables and not to identify the actual values of the generator.

Therfore, I would expect to get a binary operation object for a in your example. In my opinion, identifying the actual generator value of i is up to the developer who is using your framework.

For example, for the variable i, I also have to process the ast.Call object computed by the constant propagation to get the actual values of the generator.

Jarvx commented 2 years ago

This makes sense. We will discuss this within the team and then make changes. Thanks very much and I really enjoyed the discussion.

Jarvx commented 2 years ago

Many thanks for the suggestion@simisimon. The suggested implementation has been pushed.