Open frej opened 1 year ago
Thanks for the mention! This looks like it might be a duplicate of #7478 ?
This looks like it might be a duplicate of #7478 ?
Looks very likely, I wonder what I did wrong when I searched for beam_types:join
among the issues yesterday and didn't find this one?
Permuting the order of functions returned by init_sig_roots(FuncDb)
does not fix the problem in #7478 , probably because we have only one exported function which calls the list comprehension.
Crashing on beam_ssa_type:join_arg_types/3
in this manner means that previously unreachable code has suddenly become reachable, which is not supposed to happen (other way around is fine). The cause is nearly always some silly small thing that accidentally widens a type.
In the absence of bugs like these, the result will be equivalent regardless of work order, so I prefer to leave it undefined to help shake more bugs out. I’ll look deeper into it today. :)
It looks like this has the same root cause as #7342 / #6599 :(
I couldn't keep away from scratching this itch, #7512 fixes this crash and the crashes in #7478.
I took a quick look at the reproducers in #7342 and #6599 but for those, #7512 does nothing. So I don't think I agree that this is the same root case as the latter issues :)
Like I said before, this kind of crash happens when another bug suddenly makes previously unreachable code reachable. That bug is #7342/#6599 in this case (failure to propagate type information between tests on different variables relating in some way to the same value).
7512 fixes the crash by sweeping the bug under the rug, which I don't want to do in the general case because the code being unreachable at first could have been an error. It happens to be safe in this particular case, but we can't say that it will always be so.
My take is the exact opposite. The crashes in the reproducers of #7478 and #7512 are because an unreachable function is, on entry to beam_ssa_type:opt_start/2
, considered reachable, is still considered reachable by beam_ssa_type:opt_continue/4
but lacks the meta data a reachable function should have in #func_info.arg_types
in the function info database (as that is only updated when a call is encountered).
My analysis is that on entry to opt_start
the function we'll crash in (call it f
), is considered reachable . Running signatures/2
doesn't change that. opt_start_1
traverses all functions in the module and calls opt_function
which will use the type knowledge we have to traverse the function's CFG. If we reach a local call, opt_local_call()
will update the argument types for the callee in the function info database. In the reproducers, opt_local_call()
is never called with f
as a callee and hence arg_types
for it will never be updated.
Unless we prune functions which become unreachable when we use type knowledge beam_ssa_type:opt_continue/4
will crash as soon as it calls beam_ssa_type:join_arg_types/3
. From trace printouts I can see that at no point in time f
has anything but a list of empty dictionaries in #func_info.arg_types
, so this is clearly not a case of previously unreachable code becoming reachable.
The only bug I can see being swept under the rug with #7512, is if something is wrong with the type analysis so that it considers reachable code unreachable. But in that case, what stops us from crashing when it is right and a function which looks reachable turns out not to be, during beam_ssa_type:opt_start/2
?
You're looking at the wrong part of the code, the initial problem occurs much earlier during signatures/2
where a function that used to be reachable under a narrow argument type ("argument 2 only be 'false'
") suddenly becomes unreachable with a wider argument type ("argument 2 can be 'false' | 'ok'
") in the middle of the analysis. Note that at this stage the invariant is inverted: previously reachable code cannot be made unreachable.
This confuses the heck out of opt_start/2
which treats it as reachable during pruning and unreachable during its analysis (which adds the argument types used by opt_continue/4
).
Pruning the functions once more as in #7512 hides the bug in signatures/2
.
I see, that makes sense. I'll withdraw #7512.
Describe the bug
Erlc can sometimes crash with a back-trace like this:
To Reproduce Unless the patch below is applied, the crash appears to be dependent on the phase of the moon :) With the patch applied, the reproducer (found with Erlfuzz, thanks @RobinMorisset ) triggers the error reliably.
Affected versions At least
maint
(fe0571ecad3dd01609f2bd237a38816912fbd23f)Additional context
As far as I can tell,
beam_ssa_type:signatures_1/2
can produce a different#sig_st{}
andFuncDb
depending on the order of the initial worklist in the#sig_st{}
. This, in turn, leads to number of list-comprehension functions being eliminated from optimization when the compilation succeeds, but kept around when the crash occurs.