JuliaRobotics / IncrementalInference.jl

Clique recycling non-Gaussian (multi-modal) factor graph solver; also see Caesar.jl.
MIT License
72 stars 21 forks source link

unexpected/sporadic testDefaultDeconv test error? #1482

Closed dehann closed 2 years ago

dehann commented 2 years ago

during development of #1481 , the follow test failure happened in test, but not during separate REPL testing:

Test Summary:            | Pass  Total
deconv through multihypo |    7      7
[ Info: Ensure variables are all initialized (graphinit)
[ Info: x0 is not initialized, and will do so now...
[ Info: try doautoinit! of x0
[ Info: init with useinitfct [:x0f1]
[ Info: do init of x0
[ Info: x1 is not initialized, and will do so now...
[ Info: try doautoinit! of x1
[ Info: init with useinitfct [:x0x1f1]
[ Info: do init of x1
[ Info: x2 is not initialized, and will do so now...
[ Info: try doautoinit! of x2
[ Info: init with useinitfct [:x1x2f1]
[ Info: do init of x2
[ Info: x3 is not initialized, and will do so now...
[ Info: try doautoinit! of x3
[ Info: init with useinitfct [:x2x3f1]
[ Info: do init of x3
[ Info: x4 is not initialized, and will do so now...
[ Info: try doautoinit! of x4
[ Info: init with useinitfct [:x3x4f1]
[ Info: do init of x4
[ Info: x5 is not initialized, and will do so now...
[ Info: try doautoinit! of x5
[ Info: init with useinitfct [:x4x5f1]
[ Info: do init of x5
[ Info: x6 is not initialized, and will do so now...
[ Info: try doautoinit! of x6
[ Info: init with useinitfct [:x5x6f1]
[ Info: do init of x6
[ Info: l1 is not initialized, and will do so now...
[ Info: try doautoinit! of l1
[ Info: init with useinitfct [:x0l1f1, :x6l1f1]
[ Info: do init of l1
[ Info: Solving over the Bayes (Junction) tree.
Building Bayes net...
Find potential functions for each clique
[ Info: Bayes Tree Complete
[ Info: Do tree based init-ference
FSM cliq1, iter=1 -- setCliqueRecycling_StateMachine, NULL
FSM cliq1, iter=2 -- buildCliqSubgraph_StateMachine, NULL
FSM cliq1, iter=3 -- presolveChecklist_StateMachine, NULL
FSM cliq1, iter=4 -- waitForUp_StateMachine, NULL
FSM cliq2, iter=1 -- setCliqueRecycling_StateMachine, NULL
FSM cliq2, iter=2 -- buildCliqSubgraph_StateMachine, NULL
FSM cliq2, iter=3 -- presolveChecklist_StateMachine, NULL
FSM cliq2, iter=4 -- waitForUp_StateMachine, NULL
FSM cliq3, iter=1 -- setCliqueRecycling_StateMachine, NULL
FSM cliq3, iter=2 -- buildCliqSubgraph_StateMachine, NULL
FSM cliq3, iter=3 -- presolveChecklist_StateMachine, NULL
FSM cliq3, iter=4 -- waitForUp_StateMachine, NULL
FSM cliq3, iter=5 -- preUpSolve_StateMachine, NULL
FSM cliq3, iter=6 -- solveUp_StateMachine, NULL
FSM cliq3, iter=7 -- postUpSolve_StateMachine, UPSOLVED
FSM cliq4, iter=1 -- setCliqueRecycling_StateMachine, NULL
FSM cliq4, iter=2 -- buildCliqSubgraph_StateMachine, NULL
FSM cliq4, iter=3 -- presolveChecklist_StateMachine, NULL
FSM cliq4, iter=4 -- waitForUp_StateMachine, NULL
FSM cliq4, iter=5 -- preUpSolve_StateMachine, NULL
FSM cliq4, iter=6 -- solveUp_StateMachine, NULL
FSM cliq4, iter=7 -- postUpSolve_StateMachine, UPSOLVED
FSM cliq5, iter=1 -- setCliqueRecycling_StateMachine, NULL
FSM cliq5, iter=2 -- buildCliqSubgraph_StateMachine, NULL
FSM cliq5, iter=3 -- presolveChecklist_StateMachine, NULL
FSM cliq5, iter=4 -- waitForUp_StateMachine, NULL
FSM cliq5, iter=5 -- preUpSolve_StateMachine, NULL
FSM cliq5, iter=6 -- solveUp_StateMachine, NULL
FSM cliq5, iter=7 -- postUpSolve_StateMachine, UPSOLVED
FSM cliq6, iter=1 -- setCliqueRecycling_StateMachine, NULL
FSM cliq6, iter=2 -- buildCliqSubgraph_StateMachine, NULL
FSM cliq6, iter=3 -- presolveChecklist_StateMachine, NULL
FSM cliq6, iter=4 -- waitForUp_StateMachine, NULL
FSM cliq6, iter=5 -- preUpSolve_StateMachine, NULL
FSM cliq6, iter=6 -- solveUp_StateMachine, NULL
FSM cliq6, iter=7 -- postUpSolve_StateMachine, UPSOLVED
FSM cliq3, iter=8 -- waitForDown_StateMachine, UPSOLVED
FSM cliq4, iter=8 -- waitForDown_StateMachine, UPSOLVED
FSM cliq5, iter=8 -- waitForDown_StateMachine, UPSOLVED
FSM cliq6, iter=8 -- waitForDown_StateMachine, UPSOLVED
FSM cliq2, iter=5 -- preUpSolve_StateMachine, NULL
FSM cliq2, iter=6 -- solveUp_StateMachine, NULL
FSM cliq2, iter=7 -- postUpSolve_StateMachine, UPSOLVED

FSM cliq2, iter=8 -- waitForDown_StateMachine, UPSOLVEDInterruptException:

Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:777
  [2] wait()
    @ Base ./task.jl:837
  [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
    @ Base ./condition.jl:123
  [4] _wait(t::Task)
    @ Base ./task.jl:293
  [5] sync_end(c::Channel{Any})
    @ Base ./task.jl:361
  [6] macro expansion
    @ ./task.jl:400 [inlined]
  [7] waitForUp_StateMachine(csmc::CliqStateMachineContainer{BayesTreeNodeData, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree})
    @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqueStateMachine.jl:181
  [8] (::StateMachine{CliqStateMachineContainer})(userdata::CliqStateMachineContainer{BayesTreeNodeData, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree}, timeout::Int64; pollinterval::Float64, breakafter::Function, verbose::Bool, verbosefid::Base.TTY, verboseXtra::IncrementalInference.CliqStatus, iterlimit::Int64, injectDelayBefore::Nothing, recordhistory::Bool, housekeeping_cb::IncrementalInference.var"#475#477"{IncrementalInference.TreeClique, CliqStateMachineContainer{BayesTreeNodeData, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree}})
    @ FunctionalStateMachine ~/.julia/packages/FunctionalStateMachine/2JZFG/src/StateMachine.jl:94
  [9] initStartCliqStateMachine!(dfg::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, tree::MetaBayesTree, cliq::IncrementalInference.TreeClique, timeout::Int64; oldcliqdata::BayesTreeNodeData, verbose::Bool, verbosefid::Base.TTY, drawtree::Bool, show::Bool, incremental::Bool, limititers::Int64, upsolve::Bool, downsolve::Bool, recordhistory::Bool, delay::Bool, logger::Base.CoreLogging.SimpleLogger, solve_progressbar::Nothing, algorithm::Symbol, solveKey::Symbol)
    @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqueStateMachine.jl:64
 [10] tryCliqStateMachineSolve!(dfg::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, treel::MetaBayesTree, cliqKey::Int64, timeout::Int64; oldtree::MetaBayesTree, verbose::Bool, verbosefid::Base.TTY, drawtree::Bool, limititers::Int64, downsolve::Bool, incremental::Bool, delaycliqs::Vector{Symbol}, recordcliqs::Vector{Symbol}, solve_progressbar::Nothing, algorithm::Symbol, solveKey::Symbol)
    @ IncrementalInference ~/.julia/dev/IncrementalInference/src/SolverAPI.jl:112
 [11] (::IncrementalInference.var"#526#529"{MetaBayesTree, Bool, Bool, Base.TTY, Bool, Bool, Vector{Symbol}, Vector{Symbol}, Symbol, Symbol, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree, Int64, Nothing, Int64, Int64})()
    @ IncrementalInference ./task.jl:423┌ Warning: printCliqHistorySummary -- No CSM history found.
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/TreeDebugTools.jl:191
┌ Warning: printCliqHistorySummary -- No CSM history found.
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/TreeDebugTools.jl:191
┌ Error: Task 1 failed, sending error to all cliques
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqStateMachineUtils.jl:185
[ Info: All cliques should have exited
┌ Error: CSM clique 5: Exit with error state
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqStateMachineUtils.jl:152
┌ Error: CSM clique 4: Exit with error state
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqStateMachineUtils.jl:152
┌ Error: CSM clique 6: Exit with error state
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqStateMachineUtils.jl:152
┌ Error: CSM clique 3: Exit with error state
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqStateMachineUtils.jl:152
FSM cliq5, iter=9 -- exitStateMachine, UPSOLVED
FSM cliq4, iter=9 -- exitStateMachine, UPSOLVED
FSM cliq6, iter=9 -- exitStateMachine, UPSOLVED
FSM cliq3, iter=9 -- exitStateMachine, UPSOLVED
┌ Error: Task 1 failed, sending error to all cliques
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqStateMachineUtils.jl:185
┌ Error: CSM clique 2: Exit with error state
└ @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqStateMachineUtils.jl:152
FSM cliq2, iter=9 -- exitStateMachine, UPSOLVED
deconvolution tools via differential factors: Error During Test at /home/dehann/.julia/dev/IncrementalInference/test/testDefaultDeconv.jl:89
[ Info: All cliques should have exited
  Got exception outside of a @test
  TaskFailedException

      nested task error: InterruptException:
      Stacktrace:
        [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
          @ Base ./task.jl:777
        [2] wait()
          @ Base ./task.jl:837
        [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
          @ Base ./condition.jl:123
        [4] _wait(t::Task)
          @ Base ./task.jl:293
        [5] sync_end(c::Channel{Any})
          @ Base ./task.jl:361
        [6] macro expansion
          @ ./task.jl:400 [inlined]
        [7] waitForUp_StateMachine(csmc::CliqStateMachineContainer{BayesTreeNodeData, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree})
          @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqueStateMachine.jl:181
        [8] (::StateMachine{CliqStateMachineContainer})(userdata::CliqStateMachineContainer{BayesTreeNodeData, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree}, timeout::Int64; pollinterval::Float64, breakafter::Function, verbose::Bool, verbosefid::Base.TTY, verboseXtra::IncrementalInference.CliqStatus, iterlimit::Int64, injectDelayBefore::Nothing, recordhistory::Bool, housekeeping_cb::IncrementalInference.var"#475#477"{IncrementalInference.TreeClique, CliqStateMachineContainer{BayesTreeNodeData, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree}})
          @ FunctionalStateMachine ~/.julia/packages/FunctionalStateMachine/2JZFG/src/StateMachine.jl:94
        [9] initStartCliqStateMachine!(dfg::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, tree::MetaBayesTree, cliq::IncrementalInference.TreeClique, timeout::Int64; oldcliqdata::BayesTreeNodeData, verbose::Bool, verbosefid::Base.TTY, drawtree::Bool, show::Bool, incremental::Bool, limititers::Int64, upsolve::Bool, downsolve::Bool, recordhistory::Bool, delay::Bool, logger::Base.CoreLogging.SimpleLogger, solve_progressbar::Nothing, algorithm::Symbol, solveKey::Symbol)
          @ IncrementalInference ~/.julia/dev/IncrementalInference/src/CliqueStateMachine.jl:64
       [10] tryCliqStateMachineSolve!(dfg::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, treel::MetaBayesTree, cliqKey::Int64, timeout::Int64; oldtree::MetaBayesTree, verbose::Bool, verbosefid::Base.TTY, drawtree::Bool, limititers::Int64, downsolve::Bool, incremental::Bool, delaycliqs::Vector{Symbol}, recordcliqs::Vector{Symbol}, solve_progressbar::Nothing, algorithm::Symbol, solveKey::Symbol)
          @ IncrementalInference ~/.julia/dev/IncrementalInference/src/SolverAPI.jl:112
       [11] (::IncrementalInference.var"#526#529"{MetaBayesTree, Bool, Bool, Base.TTY, Bool, Bool, Vector{Symbol}, Vector{Symbol}, Symbol, Symbol, GraphsDFG{SolverParams, DFGVariable, DFGFactor}, MetaBayesTree, Int64, Nothing, Int64, Int64})()
          @ IncrementalInference ./task.jl:423
  Stacktrace:
    [1] sync_end(c::Channel{Any})
      @ Base ./task.jl:381
    [2] macro expansion
      @ ./task.jl:400 [inlined]
    [3] taskSolveTree!(dfg::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, treel::MetaBayesTree, timeout::Int64; oldtree::MetaBayesTree, drawtree::Bool, verbose::Bool, verbosefid::Base.TTY, limititers::Int64, limititercliqs::Vector{Pair{Symbol, Int64}}, downsolve::Bool, incremental::Bool, multithread::Bool, skipcliqids::Vector{Symbol}, recordcliqs::Vector{Symbol}, delaycliqs::Vector{Symbol}, smtasks::Vector{Task}, algorithm::Symbol, solveKey::Symbol)
      @ IncrementalInference ~/.julia/dev/IncrementalInference/src/SolverAPI.jl:50
    [4] solveTree!(dfgl::GraphsDFG{SolverParams, DFGVariable, DFGFactor}, oldtree::MetaBayesTree; timeout::Int64, storeOld::Bool, verbose::Bool, verbosefid::Base.TTY, delaycliqs::Vector{Symbol}, recordcliqs::Vector{Symbol}, limititercliqs::Vector{Pair{Symbol, Int64}}, injectDelayBefore::Nothing, skipcliqids::Vector{Symbol}, eliminationOrder::Vector{Symbol}, eliminationConstraints::Vector{Symbol}, smtasks::Vector{Task}, dotreedraw::Vector{Int64}, runtaskmonitor::Bool, algorithm::Symbol, solveKey::Symbol, multithread::Bool)
      @ IncrementalInference ~/.julia/dev/IncrementalInference/src/SolverAPI.jl:356
    [5] macro expansion
      @ ~/.julia/dev/IncrementalInference/test/testDefaultDeconv.jl:102 [inlined]
    [6] macro expansion
      @ /usr/local/share/julia-1.7.2/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
    [7] top-level scope
      @ ~/.julia/dev/IncrementalInference/test/testDefaultDeconv.jl:93
    [8] include(fname::String)
      @ Base.MainInclude ./client.jl:451
    [9] top-level scope
      @ ~/.julia/dev/IncrementalInference/test/runtests.jl:45
   [10] include(fname::String)
      @ Base.MainInclude ./client.jl:451
   [11] top-level scope
      @ none:6
   [12] eval
      @ ./boot.jl:373 [inlined]
   [13] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:268
   [14] _start()
      @ Base ./client.jl:495
Test Summary:                                | Error  Total
deconvolution tools via differential factors |     1      1
ERROR: [ Info: monitorCSMs: all tasks done
LoadError: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/dehann/.julia/dev/IncrementalInference/test/testDefaultDeconv.jl:89
in expression starting at /home/dehann/.julia/dev/IncrementalInference/test/runtests.jl:45
ERROR: Package IncrementalInference errored during testing
dehann commented 2 years ago

Happens on both JL v1.6.5 and v1.7.2. Only happens with user REPL to Pkg, call on test IncrementalInference. Does not seem to happen on CI calls.

dehann commented 2 years ago

Ah, figured it out. There was a FSM timeout, and my machine was throttled down. So the FSM "watchdog" uses Interrupt Exception to stop CSMs: https://github.com/JuliaRobotics/IncrementalInference.jl/blob/9ac43d03831dbd409166c2fc5f301e67979d6283/test/testDefaultDeconv.jl#L102

See