StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
657 stars 146 forks source link

Regent: Assertion failed in rdir/plugin/src/regent/flow.t with metaprogramming #898

Closed LonelyCat124 closed 2 months ago

LonelyCat124 commented 3 years ago

I've not managed to work out a way to do this without metaprogramming, but I have a file that causes this to fail at line 244 (assert(flow.is_valid_node(from_node) and flow.is_valid_node(to_node))). Full stack error trace:

...Legion/legion/language/src/rdir/plugin/src/regent/flow.t:244: assertion failed!
stack traceback:
        [C]: in function 'assert'
        ...Legion/legion/language/src/rdir/plugin/src/regent/flow.t:244: in function 'add_edge'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:341: in function 'add_input_edge'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:803: in function 'close'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:842: in function 'transition'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:919: in function 'open_region_tree_node'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:930: in function 'open_region_tree_top_initialize'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:969: in function 'open_region_tree_top'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:998: in function 'open_region_tree'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:3319: in function 'stat_for_list'
        ...
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:3196: in function 'block'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:3310: in function 'stat_for_list'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:3445: in function 'fn'
        /home/aidan/Legion/legion/language/terra/src/terralist.lua:150: in function 'map'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:3196: in function 'block'
        ...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:3504: in function <...gion/language/src/rdir/plugin/src/regent/flow_from_ast.t:3491>
        ...e/aidan/Legion/legion/language/src/regent/passes_hooks.t:41: in function 'optimize'
        /home/aidan/Legion/legion/language/src/regent/passes.t:52: in function </home/aidan/Legion/legion/language/src/regent/passes.t:48>
        test2.rg:34: in function 'generate_symmetric_pairwise_task'
        test2.rg:54: in main chunk

I've attached a small regent program that causes the error. The error goes away if you require parts1 * parts2, however the strategy for calling this does not allow for this requirement. Attaching the file will help, 1 sec doh.

rdir_bug.txt

elliottslaughter commented 3 years ago

The immediate workaround for this is to run with -fflow 0. The main cost is that you can't use SCR, but it avoids a number of pernicious bugs like this one that are unlikely to get fixed any time soon.

LonelyCat124 commented 3 years ago

Ok, that works - is there a way to get Regent to do a "preprocessing" type thing (like -fpretty 1) that generates Regent code that can be read by the Regent parser? I'd like to see if this is happens with the same code without the metaprogramming or not, but would prefer to not have to rewrite the code entirely.

elliottslaughter commented 3 years ago

Oh, this can definitely happen without metaprogramming. The conditions that trigger it are having two regions of the same type that Regent cannot prove to be disjoint, in combination with accessing those regions later on in the body of the task. It's a long outstanding bug in RDIR that is a result of design decisions that are difficult to change at this point.

I've been thinking about making Regent assume that interfering region requirements are disjoint---which would be like the constraint you wrote, but would be checked dynamically instead of statically. That wouldn't fix the underlying issue (which is really that the algorithm to construct RDIR probably needs to fundamentally change) but it would fix most instances where we see symptoms like this.

elliottslaughter commented 2 months ago

SCR is gone, this is no longer relevant.