Code-Inspect / flowr

A program slicer and dataflow analyzer for the R programming language.
https://github.com/Code-Inspect/flowr/wiki
GNU General Public License v3.0
14 stars 2 forks source link

(Question / potential bug) dataflow analysis for global definition side effects #815

Closed Ellpeck closed 1 month ago

Ellpeck commented 1 month ago
        assertDataflow(label('global definition', ['name-normal', ...OperatorDatabase['<-'].capabilities, 'formals-named', 'numbers', 'implicit-return', ...OperatorDatabase['*'].capabilities, ...OperatorDatabase['<<-'].capabilities, 'lambda-syntax', 'unnamed-arguments', ...OperatorDatabase['+'].capabilities, 'side-effects-in-function-call']),
            shell, `f <- function(x) 2 * x

m <- function(g) {
   f <<- g
}

m(\\(x) x + 1)

f(3)`, emptyGraph()
                .use('4', 'x', undefined, false)
                .reads('4', '1')
                .use('15', 'g', undefined, false)
                .reads('15', '10')
                .use('23', 'x', undefined, false)
                .reads('23', '21')
                .argument('5', '4')
                .call('5', '*', [argumentInCall('3'), argumentInCall('4')], { returns: [], reads: ['3', '4', BuiltIn], onlyBuiltIn: true, environment: defaultEnv().pushEnv().defineParameter('x', '1', '2') }, false)
                .argument('5', '3')
                .call('8', '<-', [argumentInCall('0'), argumentInCall('7')], { returns: ['0'], reads: [BuiltIn] })
                .argument('8', ['7', '0'])
                .argument('16', '15')
                .call('16', '<<-', [argumentInCall('14'), argumentInCall('15')], { returns: ['14'], reads: [BuiltIn], environment: defaultEnv().pushEnv().defineParameter('g', '10', '11') }, false)
                .argument('16', '14')
                .argument('17', '16')
                .call('17', '{', [argumentInCall('16')], { returns: ['16'], reads: [BuiltIn], environment: defaultEnv().pushEnv().defineParameter('g', '10', '11') }, false)
                .call('19', '<-', [argumentInCall('9'), argumentInCall('18')], { returns: ['9'], reads: [BuiltIn], environment: defaultEnv().defineFunction('f', '0', '8') })
                .argument('19', ['18', '9'])
                .argument('25', '23')
                .call('25', '+', [argumentInCall('23'), argumentInCall('24')], { returns: [], reads: ['23', '24', BuiltIn], onlyBuiltIn: true, environment: defaultEnv().pushEnv().defineParameter('x', '21', '22') }, false)
                .argument('25', '24')
                .call('29', 'm', [argumentInCall('27')], { returns: ['17'], reads: ['9'], environment: defaultEnv().defineFunction('f', '0', '8').defineFunction('m', '9', '19') })
                .argument('29', '27')
                .calls('29', '18')
                .call('33', 'f', [argumentInCall('31')], { returns: ['25'], reads: ['14'], environment: defaultEnv().defineVariable('f', '14', '16').defineFunction('m', '9', '19') })
                .argument('33', '31')
                .calls('33', '27')
                .defineVariable('1', 'x', { definedBy: [] }, false)
                .constant('3', undefined, false)
                .defineFunction('7', ['5'], {
                    out:               [],
                    in:                [],
                    unknownReferences: [],
                    entryPoint:        '5',
                    graph:             new Set(['1', '3', '4', '5']),
                    environment:       defaultEnv().pushEnv().defineParameter('x', '1', '2')
                })
                .defineVariable('0', 'f', { definedBy: ['7', '8'] })
                .defineVariable('10', 'g', { definedBy: [] }, false)
                .defineVariable('14', 'f', { definedBy: ['15', '16'] }, false)
                .sideEffectOnCall('14', '29')
                .defineFunction('18', ['17'], {
                    out:               [],
                    in:                [],
                    unknownReferences: [],
                    entryPoint:        '17',
                    graph:             new Set(['10', '15', '14', '16', '17']),
                    environment:       defaultEnv().pushEnv().defineParameter('g', '10', '11').defineVariable('f', '14', '16')
                }, { environment: defaultEnv().defineVariable('f', '14', '16') })
                .defineVariable('9', 'm', { definedBy: ['18', '19'] })
                .defineVariable('21', 'x', { definedBy: [] }, false)
                .constant('24', undefined, false)
                .defineFunction('27', ['25'], {
                    out:               [],
                    in:                [],
                    unknownReferences: [],
                    entryPoint:        '25',
                    graph:             new Set(['21', '23', '24', '25']),
                    environment:       defaultEnv().pushEnv().defineParameter('x', '21', '22')
                })
                .definesOnCall('27', '10')
                .constant('31')
                .definesOnCall('31', '21'))

This test, without any changes, produces the report

 * Vertex 18 (function definition) differs in subflow environments. Different environment sizes. expected: 2 vs. got: 1
 * Vertex 18 (function definition) differs in subflow environments. Key comparison.  More elements in expected: ["f"]
 * Vertex 18 (function definition) differs in subflow environments. Different definitions for f. expected: [{"kind":"variable","name":"f","definedAt":16,"nodeId":14}] vs. got: undefined
 * Vertex 18 (function definition) differs in subflow environments. Parents of 358 & 1802. Different environment sizes. expected: 0 vs. got: 1
 * Vertex 18 (function definition) differs in subflow environments. Parents of 358 & 1802. Key comparison. More in got: ["f"]

Removing the f node from the subflow environment yields the new report

  * Vertex 18 (function definition) differs in subflow environments. Parents of 357 & 1801. Different environment sizes. expected: 0 vs. got: 1
 * Vertex 18 (function definition) differs in subflow environments. Parents of 357 & 1801. Key comparison. More in got: ["f"]

Suddenly, an environment size of 0 is expected? After removing a single definition?

(Side note: The numbers noted as "parents" here seem... random? What do they refer to? Maybe there's room for improvement in the report here, though this might just be my lack of knowledge as well.)

Removing g instead of f from the subflow environment yields the report

  * Vertex 18 (function definition) differs in subflow environments. Different definitions for f. expected: [{"kind":"variable","name":"f","definedAt":16,"nodeId":14}] vs. got: undefined
 * Vertex 18 (function definition) differs in subflow environments. Parents of 357 & 1801. Different environment sizes. expected: 0 vs. got: 1
 * Vertex 18 (function definition) differs in subflow environments. Parents of 357 & 1801. Key comparison. More in got: ["f"]

Now, the got graph apparently has a definition for f, but the definition is undefined?

Is this a bug in the dataflow analysis, or is the report inconsistent and/or the phrasing simply confusing?

EagleoutIce commented 1 month ago

So the problem is plain and simple:

environment:       defaultEnv().pushEnv().defineParameter('g', '10', '11').defineVariable('f', '14', '16')

is wrong as it expects f to be defined in the nested environment albeit <<- causes it to be defined in the super environment so:

environment:       defaultEnv().defineVariable('f', '14', '16').pushEnv().defineParameter('g', '10', '11')

fixes this issue. The parents refer to the parent ids of the respective environment (indicating the lexicographic scoping), yet to address this issue (which i presume is part of #800) i update the diffing information - yet again re-assigning the test-impl to #800.