Closed MilesCranmer closed 3 months ago
I don't see any segmentation fault in the error you shared.
Wasn't sure what the "Illegal instruction" is. Updated description to just be "Error".
It means that the processor was asked to execute instructions it doesn't support ("illegal"). Think, for example, of trying to execute avx512 instructions on on a avx/avx2 processor (maybe because you compiled the program on a different machine, with a larger instructions set than the current one): it'd have no clue of what you're talking about.
I see. So, guess it's a bug then?
I see it on macOS M1 and then also the GitHub actions with ubuntu-latest: https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:640. This one gets the Unreachable reached at 0x7fd831cd2d85
issue:
Yes, it's definitely a bug, probably the compiler is emitting wrong instructions for the current ISA. There are a bunch of similar tickets: #53847, #53843, #53848, #53761, ...
Here's my git bisect log:
git bisect start
# status: waiting for both good and bad commits
# bad: [3a35aec36d13c3e651c97bac664da2e778d591ad] set VERSION to 1.11.0-rc1 (#54924)
git bisect bad 3a35aec36d13c3e651c97bac664da2e778d591ad
# status: waiting for good commit(s), bad commit known
# good: [48d4fd48430af58502699fdf3504b90589df3852] set VERSION to 1.10.4 (#54625)
git bisect good 48d4fd48430af58502699fdf3504b90589df3852
# good: [0ba6ec2d2282937a084d7e5e5a0b026dc953bb31] Restore link to list of packages in Base docs (#50353)
git bisect good 0ba6ec2d2282937a084d7e5e5a0b026dc953bb31
# skip: [e754f2036cbfc37ea24a33d02e86e41a9cf56af9] Add missing type annotation reported by JET (#52207)
git bisect skip e754f2036cbfc37ea24a33d02e86e41a9cf56af9
# skip: [959b474d0516df77a268d9f23ccda5d2ad32acdf] docs: update latest stable version (#52215)
git bisect skip 959b474d0516df77a268d9f23ccda5d2ad32acdf
# bad: [c5d7b87a35b5beaef9d4d3aa53c0a2686f3445b9] Fix variable name in scaling an `AbstractTriangular` with zero alpha (#52855)
git bisect bad c5d7b87a35b5beaef9d4d3aa53c0a2686f3445b9
# bad: [4115c725d25c19a86ce8d3e3a584f02d59a9a9ce] Create rand function for Base.KeySet and Base.ValueIterator{Dict} (#51608)
git bisect bad 4115c725d25c19a86ce8d3e3a584f02d59a9a9ce
# good: [8be469e275a455ca894fdc5fad8a80aafb359544] Separate foreign threads into a :foreign threadpool (#50912)
git bisect good 8be469e275a455ca894fdc5fad8a80aafb359544
# skip: [7d51502d7845246d6a231fdc4cf19451f42427e1] More missing constants from earlier libgit2 versions
git bisect skip 7d51502d7845246d6a231fdc4cf19451f42427e1
# skip: [4c3aaa2b34996708367f9d5e4472fb5a1062bf63] reflection: define `Base.generating_output` utility function (#51216)
git bisect skip 4c3aaa2b34996708367f9d5e4472fb5a1062bf63
# bad: [ca862df7bfc534d22d4d39d265d1f74d59c1ab77] fix `_tryonce_download_from_cache` (busybox.exe download error) (#51531)
git bisect bad ca862df7bfc534d22d4d39d265d1f74d59c1ab77
# skip: [5d82d8095042935be0eb044259098e0d7c695922] add tfuncs for `[and|or]_int` intrinsics (#51266)
git bisect skip 5d82d8095042935be0eb044259098e0d7c695922
# skip: [4e1c965b512967aaa20b77f37e1fe76548b1def7] Remove size(::StructuredMatrix, d) specializations (#51083)
git bisect skip 4e1c965b512967aaa20b77f37e1fe76548b1def7
# skip: [3fc4f6bb243cb623636f276cb143cf5c476bbc59] 🤖 [master] Bump the Downloads stdlib from f97c72f to 8a614d5 (#51246)
git bisect skip 3fc4f6bb243cb623636f276cb143cf5c476bbc59
# skip: [476572f749a035047d4d8e6e76ec5b701b85904e] makefile option to generate better code (#51105)
git bisect skip 476572f749a035047d4d8e6e76ec5b701b85904e
# skip: [8b3ffd8918e53d5241ad948e8500335848d3b602] cross-reference pathof and pkgdir in docstrings (#51298)
git bisect skip 8b3ffd8918e53d5241ad948e8500335848d3b602
# bad: [15f34aa649dbbb34e53ff6d16db15cd11ae4a887] [NFC] rng_split: some elaboration and clarification (#50680)
git bisect bad 15f34aa649dbbb34e53ff6d16db15cd11ae4a887
# skip: [f3d50b7de66b351dfdaa826fa529fefb75a829e1] Fix extended help hint to give full line to enter (#51193)
git bisect skip f3d50b7de66b351dfdaa826fa529fefb75a829e1
# skip: [dcb4060b58797cf64517a694fcab3ea16278cb87] docs: manual: point to `MutableArithmetics` in the Performance tips (#50987)
git bisect skip dcb4060b58797cf64517a694fcab3ea16278cb87
# skip: [70000ac7c3d5d5f21e42555cdf99e699a246f8ec] sysimg: Allow loading a system image that is already present in memory (#51121)
git bisect skip 70000ac7c3d5d5f21e42555cdf99e699a246f8ec
# skip: [7cadc6d70c0a3d2b2c20e50d4b3555475756f785] 🤖 [master] Bump the SHA stdlib from 2d1f84e to aaf2df6 (#51049)
git bisect skip 7cadc6d70c0a3d2b2c20e50d4b3555475756f785
# skip: [39a53168a824a9a223adc6642da31e7a26b6890a] optimize: fix `effect_free` refinement in post-opt dataflow analysis (#51185)
git bisect skip 39a53168a824a9a223adc6642da31e7a26b6890a
# skip: [dd0ce50f389981839d96969b279c3a11e0b4088e] Fix typo in command-line-interface.md (#51055)
git bisect skip dd0ce50f389981839d96969b279c3a11e0b4088e
# skip: [fbf73f44c000ee79d12b7bf1645f076b640fd10c] 🤖 [master] Bump the Pkg stdlib from 047734e4c to f570abd39 (#51186)
git bisect skip fbf73f44c000ee79d12b7bf1645f076b640fd10c
# skip: [b2dfa1db9e4d7b1cd499ba58943df17bc77fe1d8] Deprecate `permute!!` and `invpermute!!` (#51337)
git bisect skip b2dfa1db9e4d7b1cd499ba58943df17bc77fe1d8
# skip: [eab8d6b96b05f7e84103f66a902e4ee7ad395b48] Fix getfield codegen for tuple inputs and unknown symbol fields. (#51234)
git bisect skip eab8d6b96b05f7e84103f66a902e4ee7ad395b48
# skip: [a355403080167056d2af4ccee8eadfffd8fce97f] Annotate fieldnames for default cgparams [NFC]
git bisect skip a355403080167056d2af4ccee8eadfffd8fce97f
# skip: [354c36742eb1c2c4c5bfe454d6d4fe975565de96] Allow SparseArrays to catch `lu(::WrappedSparseMatrix)` (#51161)
git bisect skip 354c36742eb1c2c4c5bfe454d6d4fe975565de96
# bad: [e85f0a5a718f68e581b07eb60fd0d8203b0cd0da] complete false & true more generally as vals (#51326)
git bisect bad e85f0a5a718f68e581b07eb60fd0d8203b0cd0da
# skip: [91b8c9b99f05b99db8b259257adeb1997f8c4415] Add `JL_DLLIMPORT` to `small_typeof` declaration (#50892)
git bisect skip 91b8c9b99f05b99db8b259257adeb1997f8c4415
# skip: [27fa5de3f0e245cfff8c5cd1c850353742362cbf] Introduce cholesky and qr hooks for wrapped sparse matrices (#51220)
git bisect skip 27fa5de3f0e245cfff8c5cd1c850353742362cbf
# good: [74ce6cf070a2a04e836c3e5a2211228a3ac978ef] minor NFC in GC codebase (#50991)
git bisect good 74ce6cf070a2a04e836c3e5a2211228a3ac978ef
# skip: [3527213ccb1bfe0c48feab5da64d30cadbd4c526] simplify call to promote_eltype with repeated elements (#51135)
git bisect skip 3527213ccb1bfe0c48feab5da64d30cadbd4c526
# bad: [d51ad06f664b3439b4aee51b5cd5edd6b9d53c69] Avoid infinite loop when doing SIGTRAP in arm64-apple (#51284)
git bisect bad d51ad06f664b3439b4aee51b5cd5edd6b9d53c69
(Most of the skips are due to hanging precompilation.)
Only 100 revisions left in the git bisect log but I unfortunately need to run. The bisect log is above if someone wants to start where I left off. (My computer seems pretty slow at compilation unfortunately)
Executing unreachable code triggers a trap which on most architectures just becomes a SIGILL so it might be this
It's also a good idea to test an assertions build first.
Just tested assertions build; no extra info.
Is there a way I can see what Julia CI builds are successful for each commit? It seems like I have to skip a lot of commits which is making this bisecting take much longer than expected
I see a commit
Avoid infinite loop when doing SIGTRAP in arm64-apple
It seems like all of the commits before that I hit infinite precompilation... So not sure I will be able to bisect this further.
It's really obnoxious, but you can cherry pick that commit onto previous commits in your script
I just realised the SIGTRAP was what you said the error was likely coming from. So I'll try to do another bisection, treating the infinite precompilation == bad. And if that doesn't work, I'll do the cherry pick stuff.
Ok, found it! #51000 is the cause.
More clues:
-O0 --compile=min
does not seem to change the result compared to -O3
.Can we add this to the 1.11 milestone?
I redid the bisect because the PR you mentioned looked harmless and I think wouldn't cause an unreachable reached error but a GC errror. Mine ended in https://github.com/JuliaLang/julia/commit/231ca24a62522a39cdfb880acb8391191bfe53db
Weird. Maybe one of the commits was accidentally marked good/bad(?), since it is hard to know if the precompilation is truly hanging or not.
Does https://github.com/JuliaLang/julia/commit/231ca24a62522a39cdfb880acb8391191bfe53db make sense as the cause to you?
Yep. I just tried on top of release-1.11 and reverting it does make it not crash
Cool. So I guess it's the addition of @_terminates_locally_meta
is somehow an incorrect compiler assumption?
Isn't this a bug in DispatchDoctor?
I investigated quite deeply, and there doesn't seem to be a bug in Julia base or the compiler side. @_terminates_locally_meta
certainly allows concrete evaluation for DispatchDoctor._Utils._promote_op
, but that function is legally eligible for concrete evaluation.
Looking at the implementation of DispatchDoctor, it seems to use reflection (especially Core.Compiler._return_type
through Base.promote_op
) within a generator even if the target function is @generated
. This breaks the @generated
assumption and could potentially cause undefined behavior, e.g. if a function definition is added later.
even if the target function is generated
DispatchDoctor doesn’t operate on generated functions, see https://github.com/MilesCranmer/DispatchDoctor.jl?tab=readme-ov-file#-special-cases
Are we sure that not all generated functions are using _promote_op
?
Looking at the DispatchDoctor implementation, I have confirmed that DispatchDoctor does not transform functions that directly use @generated
, but IIUC it still allows @generated
function to use other @stable
functions transformed by DD within the generator, which effectively uses Core.Compiler._return_type
within the generator.
Wouldn’t that be true of any instance of a @generated
function that calls some other function which contains a list comprehension? Note that Base.promote_op
is what’s used for figuring out the right element type in such cases
That's true, but it's somewhat unavoidable. That's why the type of the object returned by a list comprehension is not defined, and code that depends on the return type of value returned by list comprehension is not recommended. DD is an extreme case, as it changes the control flow of the code based on the results of such unstable type inferences, so it's not surprising that segmentation faults or other undefined behaviors occur.
and code that depends on the return type of value returned by list comprehension is not recommended
I am very surprised by your statement here, are you certain about this? List comprehensions are very common. So does this mean we are not supposed to use dispatch on the result of any code that contains a list comprehension? In other words:
f(i) = [i for _ in 1:5]
g(::Vector) = 1
g(::Vector{Int}) = 2
h(i) = g(f(i))
h(1)
This code similarly:
“changes the control flow of the code based on the results of such unstable type inferences”
So are you saying if I have this style of code in my library, I should expect to have segfaults and undefined behavior?
It’s one thing for inference to fail to get the right type, but crashes are a different story, and we should definitely try to prevent that.
(Removing from milestone based on the discussion above)
I don’t think it’s settled. Maybe we can revert https://github.com/JuliaLang/julia/pull/51002 for 1.11, since reverting it was demonstrated to fix a Julia crash, and then add it back to master while we figure this out?
I don't think it warrants reverting if the issue comes from not following the restrictions outlined in https://docs.julialang.org/en/v1/manual/metaprogramming/#Generated-functions
Generated functions must not mutate or observe any non-constant global state (including, for example, IO, locks, non-local dictionaries, or using hasmethod). This means they can only read global constants, and cannot have any side effects. In other words, they must be completely pure. Due to an implementation limitation, this also means that they currently cannot define a closure or generator.
The issue is not from a generated function; see comment above. https://github.com/JuliaLang/julia/issues/55147#issuecomment-2259944230
If the issue is on the package’s side, Julia should safely catch it and print an error, rather than crashing. We should at least clearly identify the issue so it can be documented.
And if the issue is on the Julia side, it should be patched.
In either case I don’t think it’s ready for a 1.11 release if we don’t understand what’s going on.
The issue is not from a generated function; see comment above. #55147 (comment)
It's from a generated function which may call the type inference reflection from the generator.
f(i) = [i for _ in 1:5] g(::Vector) = 1 g(::Vector{Int}) = 2 h(i) = g(f(i)) h(1)
So are you saying if I have this style of code in my library, I should expect to have segfaults and undefined behavior?
Using such code is fine (especially in normal contexts), but calling it within a generator is technically invalid. If the code generated by a generator that internally relies on such code is broken, even if it causes a segfault, it's unavoidable.
Do you mean that
f(i) = [i for _ in 1:5]
g(::Vector) = 1
g(::Vector{Int}) = 2
@generated h(i) = :(g(f(i)))
h(1)
might cause a segfault, and this is unavoidable?
Could you otherwise give an example of invalid code? I don't think DD is doing anything illegal which is why I'm surprised by the Illegal instruction: 4
error. Furthermore, the patch that temporarily got around that issues seems to be unrelated.
Other than that, are we absolutely sure that :terminates_locally
is a correct effect for [in|all|any](x, ::Tuple)
, 100% of the time? Are there no potential compiler optimizations which could void this in some way? There look to be only a few uses of this in Base: https://github.com/search?q=repo:JuliaLang/julia%20/_terminates_locally_meta/&type=code so maybe there are some sharp edges to the use of this effect which are not well explored yet? These are very very broadly used methods after all.
might cause a segfault, and this is unavoidable?
What I'm saying is same as what stated in the documentation, calling it from a generator may cause undefined behavior, but your example is fine since it uses list comprehension in a generated code. Having said that, list comprehension might not have been a good example because the type of the value returned by a list comprehension is not defined by type inference (for most cases, IIUC?). So as an alternative example, consider the following example as invalid:
g(f, xs...) = isconcrete(Base.infer_return_type(f, xs))
@generated function h(f, xs)
if g(f, xs)
some_ex = ...
else
some_ex = ...
end
return :( #=code using `some_ex`=# )
end
Other than that, are we absolutely sure that
:terminates_locally
is a correct effect for[in|all|any](x, ::Tuple)
, 100% of the time? Are there no potential compiler optimizations which could void this in some way? There look to be only a few uses of this in Base: https://github.com/search?q=repo:JuliaLang/julia%20/_terminates_locally_meta/&type=code so maybe there are some sharp edges to the use of this effect which are not well explored yet? These are very very broadly used methods after all.
I can't say that possibility doesn't exist, but I think the probability is lower than the probability that there is an issue on the DD side. I don't know everything about the implementation of DD, but it seems that it can break the assumptions of @generated
functions. In this case, @_terminates_locally_meta
has surfaced this possibility, but the likelihood of the issue being in the Julia compiler seems to be low (because similar issues haven't occurred in other packages).
The point I am confused about is that DD doesn't actually use @generated
functions[^1], and neither does it interact with them. The code it generates seems no different from any code a user might write.
[^1]: It does now, only to work around the issue in this thread. https://github.com/MilesCranmer/DispatchDoctor.jl/commit/4ed36ca1842f0231dbf4f53f9cd0aeeb3724f8ef Is the commit that works around it for DD (which seems to result in type instability for Zygote)
Isn't this true?
I have confirmed that DispatchDoctor does not transform functions that directly use @generated, but IIUC it still allows @generated function to use other @stable functions transformed by DD within the generator
I.e. even if DD does not transform the generated function itself, the generator of that generated function might call a function that has been transformed by DD.
Right, but the example you gave in https://github.com/JuliaLang/julia/issues/55147#issuecomment-2260326648 will *never* occur in DD-generated code. I agree that example code is problematic. But it simply never happens here.
i.e., I don't think we can write off the Illegal instruction
as coming from this theoretical code.
The issue is that generated functions that may call DD-transformed functions are not pure. The exact pattern of the example I gave above is one instance of such impure-ness.
Okay, this is the part that doesn't make sense to me though, because I could have the very same issue with any code that implicitly has a Base.promote_op
, such as any list comprehension or map. Basically I'm not seeing the difference between DD and any other code that uses a list comprehension followed by a dispatch.
Okay, to test your theory, I just manually went through the codebase of the DispatchDoctor'd package, DynamicExpressions.jl on the master
branch (specifically https://github.com/SymbolicML/DynamicExpressions.jl/commit/67bfab09033d3192d79defb79e1c313c2da03900). There are several @generated
functions where the branch of generated code depends on the result of DispatchDoctor'd functions. I went through and disabled each of these.
This does not solve the problem, I get the same error as above.
Here is the git patch you can apply for yourself, following the reproduction guide in https://github.com/JuliaLang/julia/issues/55147#issue-2412066259
From 09179ab1d7cdda1664a23a55816ad412c719a6de Mon Sep 17 00:00:00 2001
From: MilesCranmer <miles.cranmer@gmail.com>
Date: Wed, 31 Jul 2024 18:21:49 +0100
Subject: [PATCH] hack: manually disable dispatch doctor in generated branches
---
src/Evaluate.jl | 4 ++--
src/Utils.jl | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/src/Evaluate.jl b/src/Evaluate.jl
index c5cedb93..9d160a8e 100644
--- a/src/Evaluate.jl
+++ b/src/Evaluate.jl
@@ -183,8 +183,8 @@ function eval_tree_array(
return eval_tree_array(tree, cX, operators; kws...)
end
-get_nuna(::Type{<:OperatorEnum{B,U}}) where {B,U} = counttuple(U)
-get_nbin(::Type{<:OperatorEnum{B}}) where {B} = counttuple(B)
+@unstable get_nuna(::Type{<:OperatorEnum{B,U}}) where {B,U} = counttuple(U)
+@unstable get_nbin(::Type{<:OperatorEnum{B}}) where {B} = counttuple(B)
function _eval_tree_array(
tree::AbstractExpressionNode{T},
diff --git a/src/Utils.jl b/src/Utils.jl
index bd3326e2..343a662e 100644
--- a/src/Utils.jl
+++ b/src/Utils.jl
@@ -1,6 +1,7 @@
"""Useful functions to be used throughout the library."""
module UtilsModule
+using DispatchDoctor: @unstable
using MacroTools: postwalk, @capture, splitdef, combinedef
# Returns two arrays
@@ -124,7 +125,7 @@ function deprecate_varmap(variable_names, varMap, func_name)
return variable_names
end
-counttuple(::Type{<:NTuple{N,Any}}) where {N} = N
+@unstable counttuple(::Type{<:NTuple{N,Any}}) where {N} = N
"""
Undefined
--
2.39.0
Apply this to DynamicExpressions.jl. This essentially makes it so that DispatchDoctor.jl is never used in the code which selects the branch of each @generated
function.
I then re-precompile the code. I run into the same issue as before:
Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/Users/mcranmer/.julia/compiled/v1.11/DynamicExpressions/jl_mQJIc6".
[12551] signal 4: Illegal instruction: 4
in expression starting at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/DynamicExpressions.jl:131
_eval_tree_array at /Users/mcranmer/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/Evaluate.jl:160 [inlined]
#eval_tree_array#3 at /Users/mcranmer/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:306
eval_tree_array at /Users/mcranmer/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
#test_all_combinations#1 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:7
test_all_combinations at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:22 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:180 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:165 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
#do_precompilation#2 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:150
do_precompilation at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:173
A code generator using list comprehension is technically impure, but list comprehension does not change observable behavior based on the result of type inference. In other words, the impure ness of list comprehension does not significantly affect the generator's behavior, so it usually does not cause problems.
On the other hand, if my understanding is correct, DD transformation does change a generator behavior based on the results of type inference and sometimes performs other impure actions like printing.
Our messages sent at the same time, please see https://github.com/JuliaLang/julia/issues/55147#issuecomment-2261010153
I can't say that possibility doesn't exist, but I think the probability is lower than the probability that there is an issue on the DD side.
I think it's important to think about the expected cost of each option though. A bug in Julia is much worse than a bug in DD. Unless you can be 100% sure that :terminates_locally
is the right effect for [in|all|any](x, ::Tuple)
, for all possible types going into Tuple
, and all possible compiler optimizations that might happen, then it seems a bit risky.
It's just strange that DD would work fine for 1.6.7 - 1.10.4, across several tested packages, running with heavy CI across multiple operating systems, and this one single PR be enough to surface an Illegal instruction
error? DD even works fine on 1.11.0 without that PR.
And the fact that the current workaround is to wrap the promote_op
inside another @generated
function... Also seems weird?
You also need this...
diff --git a/src/utils.jl b/src/utils.jl
index 197d0be..5ec22c0 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -120,7 +120,7 @@ return false for `Union{}`, so that errors can propagate.
# so we implement a workaround.
@inline type_instability(::Type{Type{T}}) where {T} = type_instability(T)
-@generated function type_instability_limit_unions(
+function type_instability_limit_unions(
::Type{T}, ::Val{union_limit}
) where {T,union_limit}
if T isa UnionAll
With this it succeeds precompilation.
type_instability_limit_unions
isn't used for DynamicExpressions. I presume you are checking out the master branch of DispatchDoctor? You need to use v0.4.10, before I implemented the workaround.
(I just tried your suggestion, and, while probably a good idea anyways, it doesn't fix this issue.)
I'm on DE master with your patch and DD on v0.4.10 with my patch.
Can you reproduce the compilation error normally? Illegal instruction
error is system-dependent.
(I'm also DE master + patch; v0.4.10 with your patch – still see the bug. But I'm on ARM64 on my MacBook)
The precompilation error reproduces with either or both of the patches reverted.
Well this is getting even weirder then. I simply cannot reproduce what you are seeing. I've now triple checked my Manifest file, LocalPreferences, and even deleted by entire ~/.julia/compiled
. Same bug. Only from the patch https://github.com/MilesCranmer/DispatchDoctor.jl/commit/4ed36ca1842f0231dbf4f53f9cd0aeeb3724f8ef does it go away.
What's your versioninfo()
? Are you on 1.11-rc2 or 1.11-rc1 (or nightly)? I'm on 1.11.0-rc1 at the moment.
Edit: Same issue on 1.11.0-rc2 for me.
It's just strange that DD would work fine for 1.6.7 - 1.10.4, across several tested packages, running with heavy CI across multiple operating systems, and this one single PR be enough to surface an Illegal instruction error
It is not very strange, faulty code can be working for a long time until some innocent looking change happens to trigger the faulty behavior. As an example, using pointer
without GC preserving the object usually worked OK until the GC got better and it started cleanung things up under your nose. The solution then is not to revert the improved GC.
I'm seeing some precompilation crashes on Julia 1.11-rc1 when precompiling DynamicExpressions.jl with DispatchDoctor in-use on the package. (DispatchDoctor.jl is basically a package that calls
promote_op
on each function and uses that to flag type instabilities.)Here is the traceback:
versioninfo:
I installed Julia with juliaup. To reproduce this issue, you can run the following code:
I can prevent this error with the following PR on DispatchDoctor.jl: https://github.com/MilesCranmer/DispatchDoctor.jl/compare/094b1651eeef3fb2017be46a48f0da13724e1123~...b223a4d033dd3d17901879871c508ae33cfd550a. The PR basically amounts to changing some functions into
@generated
form:However, it doesn't seem like DispatchDoctor.jl or DynamicExpressions.jl is doing anything wrong, so I'm not sure what's going on. Both before and after seem to be valid Julia code. Also, the downside of that PR is it introduces a type instability in Zygote autodiff, and there doesn't seem to be a way around it that both prevents the segfault while also eliminating the type instability.
I don't understand the conditions for reproducing this, so this is so far my only example. When I make various tweaks to
_promote_op
within DispatchDoctor.jl, I seem to end up with different segfaults – one of which is theUnreachable reached
bug.cc @avik-pal