JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.93k stars 5.49k forks source link

Randomly occurring issue where the contents of the environment seems to matter #56150

Open ffevotte opened 1 month ago

ffevotte commented 1 month ago

(X-link: https://discourse.julialang.org/t/issue-with-1-11-where-package-loading-order-matters/121233?u=ffevotte)

When switching to julia 1.11 in a big project with lots of dependencies, I’ve encountered a strange issue that didn’t happen with 1.10.

After some debugging, I’ve managed to isolate the following small example:

# reproducer.jl
import Pkg
Pkg.activate(; temp=true)
Pkg.add(["GraphViz", "FileIO", "Cairo"])

using GraphViz
using FileIO
using Cairo

FileIO.save("tmp.png", dot"""
    digraph graphname {
        a -> b -> c;
        b -> d;
    }
""") # -> expected result: 0 (and the side effect of actually producing "tmp.png")

This example sometimes works, and sometimes fails in the following way:

shell$ julia --startup-file=no reproducer.jl 
Activating new project at `/tmp/jl_6y3EbH`
   Resolving package versions...
    Updating `/tmp/jl_6y3EbH/Project.toml`
  [159f3aea] + Cairo v1.1.0
  [5789e2e9] + FileIO v1.16.4
  [f526b714] + GraphViz v0.2.0
[...]
Error: renderer for julia:cairo is unavailable
Errors encountered while save File{DataFormat{:PNG}, String}("tmp.png").
All errors:
===========================================
MethodError: no method matching save(::File{DataFormat{:PNG}, String}, ::GraphViz.Graph)
The function `save` exists, but no method is defined for this combination of argument types.
[...]

And the aforementioned discourse discussion helped discover that the issue does not always manifest itself:

shell$ cat reproducer.jl 
import Pkg
Pkg.activate(; temp=true)
Pkg.add(["GraphViz", "FileIO", "Cairo"])

using GraphViz
using FileIO
using Cairo

FileIO.save("tmp.png", dot"""
    digraph graphname {
        a -> b -> c;
        b -> d;
    }
""")

shell$ for i in $(seq 20); do
         julia --startup-file=no reproducer.jl >/dev/null 2>&1;
         echo $?;
       done
# fails 8 times out of 20
0
1
0
0
1
1
1
0
1
0
1
0
0
0
0
1
1
0
0
0

I'm not sure whether it is significant, but adding unrelated packages to the environment seems to increase the probability of occurrence of the issue. For example, when adding CairoMakie to the mix (even without loading it):

shell$ cat reproducer.jl 
import Pkg
Pkg.activate(; temp=true)
Pkg.add(["GraphViz", "FileIO", "Cairo", "CairoMakie"])

using GraphViz
using FileIO
using Cairo

FileIO.save("tmp.png", dot"""
    digraph graphname {
        a -> b -> c;
        b -> d;
    }
""")

shell$ for i in $(seq 20); do
         julia --startup-file=no reproducer.jl >/dev/null 2>&1;
         echo $?;
       done
# fails 16 times out of 20
1
0
1
1
1
1
1
1
1
1
0
1
1
0
1
1
1
1
1
0

I'm filing this issue here because this same example works well (and consistently/reliably) with Julia 1.10, which seems to point at a 1.11 regression. That being said, it may very well be that this issue actually belongs to one of the packages involved.

If there are any more tests that you can think of to better understand this, I'm more than willing to help. But I'm a bit short on ideas at the moment...

Thanks in advance!

shell$ julia -e 'using InteractiveUtils; versioninfo()'
Julia Version 1.11.0
Commit 501a4f25c2b (2024-10-07 11:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Environment:
  LD_LIBRARY_PATH = /home/francois/.local/lib
sgaure commented 1 month ago

I can reproduce it. This one puzzled me. I have bisected it to #51275. That doesn't make much sense, unless there is something in one of those packages which does some unsafe string handling which by chance happened to work before, and not after that PR. Or the bisection could be random (though I used 50 runs of the script to see if it failed). It would make more sense if it was the Pkg bump in 51296, but it doesn't seem to be that one.