FluxML / Flux.jl

Relax! Flux is the ML library that doesn't make you tensor
https://fluxml.ai/
Other
4.55k stars 609 forks source link

Import Flux on worker crashes #1625

Open DeanLym opened 3 years ago

DeanLym commented 3 years ago

Hi, I am trying to use Flux on a worker node. I ran into the following error. Any idea what's the issue here?

Julia 1.6.1, Windows 10.

@everywhere import Flux
      From worker 2:
      From worker 2:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.       
      From worker 2:    Exception: EXCEPTION_ACCESS_VIOLATION at 0x1ff3d60 -- _jl_restore_incremental at /cygdrive/c/buildbot/worker/package_win64/build/src\dump.c:2552
      From worker 2:    in expression starting at none:0
      From worker 2:    _jl_restore_incremental at /cygdrive/c/buildbot/worker/package_win64/build/src\dump.c:2552
      From worker 2:    jl_restore_incremental at /cygdrive/c/buildbot/worker/package_win64/build/src\dump.c:2605
      From worker 2:    _include_from_serialized at .\loading.jl:658
      From worker 2:    _require_search_from_serialized at .\loading.jl:760
      From worker 2:    _tryrequire_from_serialized at .\loading.jl:689
      From worker 2:    _require_search_from_serialized at .\loading.jl:749
      From worker 2:    _tryrequire_from_serialized at .\loading.jl:689
      From worker 2:    _require_search_from_serialized at .\loading.jl:749
      From worker 2:    _require at .\loading.jl:998
      From worker 2:    require at .\loading.jl:914
      From worker 2:    #1 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Distributed\src\Distributed.jl:79
      From worker 2:    unknown function (ip: 000000002856ccf3)
      From worker 2:    jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1703 [inlined]
      From worker 2:    do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:670
      From worker 2:    #103 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Distributed\src\process_messages.jl:274
      From worker 2:    run_work_thunk at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Distributed\src\process_messages.jl:63
      From worker 2:    unknown function (ip: 0000000028564f36)
      From worker 2:    run_work_thunk at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Distributed\src\process_messages.jl:72
      From worker 2:    #96 at .\task.jl:411
      From worker 2:    unknown function (ip: 0000000028564a03)
      From worker 2:    jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1703 [inlined]
      From worker 2:    start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:839
      From worker 2:    Allocations: 6792367 (Pool: 6790048; Big: 2319); GC: 9
DhairyaLGandhi commented 3 years ago

I run Flux on multiple workers on a daily basis, but that's on Linux and macos. This doesn't look like a Flux specific dump here. Can you try with using? Is it reproducible? If so, it might merit an issue on the julia repo.

DeanLym commented 3 years ago

@DhairyaLGandhi I tried using. It leads to the same error. It is reproducible.

DhairyaLGandhi commented 3 years ago

Could you send over the output of versioninfo() and ] st -m?

It maybe useful to try and import the dependencies of Flux in clean environments too.

DeanLym commented 3 years ago

@DhairyaLGandhi

versioninfo()

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9920X CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)
Environment:
  JULIA_EDITOR = "C:\Users\yimin\AppData\Local\atom\app-1.57.0\atom.exe"  -a
  JULIA_NUM_THREADS = 12

] st -m

  [621f4979] AbstractFFTs v1.0.1
  [1520ce14] AbstractTrees v0.3.4
  [79e6a3ab] Adapt v3.3.1
  [ab4f0b2a] BFloat16s v0.1.0
  [a74b3585] Blosc v0.7.0
  [fa961155] CEnum v0.4.1
  [052768ef] CUDA v3.3.1
  [082447d4] ChainRules v0.8.15
  [d360d2e6] ChainRulesCore v0.10.9
  [944b1d66] CodecZlib v0.7.0
  [3da002f7] ColorTypes v0.11.0
  [5ae59095] Colors v0.12.8
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v3.31.0
  [9a962f9c] DataAPI v1.7.0
  [864edb3b] DataStructures v0.18.9
  [163ba53b] DiffResults v1.0.3
  [b552c78f] DiffRules v1.0.2
  [ffbed154] DocStringExtensions v0.8.5
  [e2ba6199] ExprTools v0.1.3
  [1a297f60] FillArrays v0.11.7
  [53c48c17] FixedPointNumbers v0.8.4
  [587475ba] Flux v0.12.4
  [f6369f11] ForwardDiff v0.10.18
  [d9f16b24] Functors v0.2.1
  [0c68f7d7] GPUArrays v7.0.1
  [61eb1bfa] GPUCompiler v0.12.3
  [f67ccb44] HDF5 v0.15.5
  [7869d1d1] IRTools v0.4.3
  [692b3bcd] JLLWrappers v1.3.0
  [e5e0dc1b] Juno v0.8.4
  [929cbde3] LLVM v3.9.0
  [2ab3a3ac] LogExpFunctions v0.2.4
  [1914dd2f] MacroTools v0.5.6
  [e89f7d12] Media v0.5.0
  [e1d29d7a] Missings v1.0.0
  [872c559c] NNlib v0.7.22
  [a00861dc] NNlibCUDA v0.1.4
  [77ba4419] NaNMath v0.3.5
  [bac558e1] OrderedCollections v1.4.1
  [d96e819e] Parameters v0.12.2
  [21216c6a] Preferences v1.2.2
  [74087812] Random123 v1.4.2
  [e6cf234a] RandomNumbers v1.4.0
  [189a3867] Reexport v1.1.0
  [ae029012] Requires v1.1.3
  [a2af1166] SortingAlgorithms v1.0.0
  [276daf66] SpecialFunctions v1.5.1
  [90137ffa] StaticArrays v1.2.4
  [82ae8749] StatsAPI v1.0.0
  [2913bbd2] StatsBase v0.33.8
  [a759f4b9] TimerOutputs v0.5.10
  [3bb67fe8] TranscodingStreams v0.9.5
  [3a884ed6] UnPack v1.0.2
  [a5390f91] ZipFile v0.9.3
  [e88e6eb3] Zygote v0.6.14
  [700de1a5] ZygoteRules v0.2.1
  [0b7ba130] Blosc_jll v1.21.0+0
  [0234f1f7] HDF5_jll v1.12.0+1
  [5ced341a] Lz4_jll v1.9.3+0
  [458c3c95] OpenSSL_jll v1.1.10+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [3161d3a3] Zstd_jll v1.5.0+0
  [0dad84c5] ArgTools
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8bb1440f] DelimitedFiles
  [8ba89e20] Distributed
  [f43a241f] Downloads
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions
  [44cfe95a] Pkg
  [de0858da] Printf
  [9abbd945] Profile
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [fa267f1f] TOML
  [a4e569a6] Tar
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll
  [deac9b47] LibCURL_jll
  [29816b5a] LibSSH2_jll
  [c8ffd9c3] MbedTLS_jll
  [14a3606d] MozillaCACerts_jll
  [83775a58] Zlib_jll
  [8e850ede] nghttp2_jll
  [3f19e933] p7zip_jll
DeanLym commented 3 years ago

@everywhere using Flux is working for me on Julia 1.5.1

julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9920X CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
  JULIA_EDITOR = "C:\Users\yimin\AppData\Local\atom\app-1.57.0\atom.exe"  -a
  JULIA_NUM_THREADS = 12