Closed NHDaly closed 4 years ago
Bisecting should be fairly quick since there are so few commits that differ between the releases.
Great idea, @KristofferC, thanks
Update: it turns out it does segfault on v1.5.0 on my macbook as well. :( But it seemed that 1.5.0 passed on the build farm (linux), but i guess i don't know how useful that is then if it's failing on macOS. I'll try a bisect from 1.4.0 to 1.5.0 tonight.
Also, we tried recording with rr
, but the rr
replay of PackageCompiler fails. From @rbvermaa:
I had issues with temp files that a replay tried to read that are gone.
Maybe we can make replay work with some hacks to write them to a fixed location. In general, is this a known failure mode for using rr
with PackageCompiler?
Ugh. I wanted to bisect from 1.4.0 to 1.5.0, but i forgot that our project didn't build on 1.4.0:
└ @ PackageCompiler ~/.julia/packages/PackageCompiler/vsMJE/src/PackageCompiler.jl:516
ERROR: LoadError: MethodError: no method matching source_path(::Pkg.Types.Context, ::Pkg.Types.PackageSpec)
Stacktrace:
[1] source_path(::Pkg.Types.Context, ::Pkg.Types.PackageSpec) at /Users/daly/.julia/packages/PackageCompiler/vsMJE/src/PackageCompiler.jl:49
[2] audit_app(::Pkg.Types.Context) at /Users/daly/.julia/packages/PackageCompiler/vsMJE/src/PackageCompiler.jl:520
[3] create_app(::String, ::String; app_name::String, precompile_execution_file::String, precompile_statements_file::Array{String,1}, incremental::Bool, filte
r_stdlibs::Bool, audit::Bool, force::Bool, c_driver_program::String, cpu_target::String) at /Users/daly/.julia/packages/PackageCompiler/vsMJE/src/PackageCompi
ler.jl:612
It builds fine on 1.4.2, but i can't bisect from 1.4.2 to 1.5.0 because they're on different branches. Does this error look familiar to you? Do you remember if there's a commit i can cherry-pick for it? Thanks, sorry this is annoyingly harder to debug than i'd like.
Perhaps you can match the last backport commit for 1.4.2 to the corresponding commit on 1.5-dev, and then bisect from that commit on 1.5-dev to 1.5.0?
Also, we tried recording with
rr
, but therr
replay of PackageCompiler fails. From @rbvermaa:I had issues with temp files that a replay tried to read that are gone.
Maybe we can make replay work with some hacks to write them to a fixed location. In general, is this a known failure mode for using
rr
with PackageCompiler?
I wonder if I did something wrong initially, I am able to replay a newly created rr recording.
I don't really see how temp files should matter. Isn't the point of rr that it records those things?
I don't really see how temp files should matter. Isn't the point of rr that it records those things?
@KristofferC Indeed, that was my understanding as well. The most likely scenario is that I made some mistake, given I was able to replay now successfully without the issue I had before. Will upload the rr recording as soon as I figure out the best way to share it.
@KristofferC - we've emailed you all the RR trace. sorry we can't upload it here since it contains sensitive info.
We'd super appreciate it if someone could take a look! ❤️
Since this is likely something in the guts of the compiler and not really related to PackageCompiler itself I think me and @JeffBezanson thought it is probably more time-efficient for him to look into it first and see if he can find something obvious.
Thanks @JeffBezanson for the quick fix! :)
For anyone following along on the internet, Jeff sent us this patch, which indeed fixed the segfault:
--- a/src/aotcompile.cpp
+++ b/src/aotcompile.cpp
@@ -307,9 +307,11 @@ void *jl_create_native(jl_array_t *methods, const jl_cgparams_t cgparams, int _p
}
if (src == NULL || !jl_is_code_info(src)) {
src = jl_type_infer(mi, params.world, 0);
- codeinst = jl_get_method_inferred(mi, src->rettype, src->min_world, src->max_world);
- if (src->inferred && !codeinst->inferred)
- codeinst->inferred = jl_nothing;
+ if (src) {
+ codeinst = jl_get_method_inferred(mi, src->rettype, src->min_world, src->max_world);
+ if (src->inferred && !codeinst->inferred)
+ codeinst->inferred = jl_nothing;
+ }
}
Thanks.
(Ref. #37386 for inclusion of this patch in 1.5.2.)
(Ref. #37406 for an equivalent patch for 1.6-dev.)
Resolved on both 1.5.2 and 1.6-dev via the pull requests linked just above. Perhaps close? :)
Our static compilation build has started segfaulting consistently when we upgraded to julia
v1.5.1
, where it wasn't segfaulting inv1.5.0
.This is new, and wasn't happening on v1.5.0. It's consistently reproducible on macOS and linux.
Here's a failure from my mac; looks similar:
We'll work on trying to get an
rr
recording to share.