Open nalimilan opened 3 years ago
This is inconvenient indeed. My only question would be if:
try
t = Threads.@spawn f(1)
fetch(t)
catch
throw(t.exception)
end
is guaranteed to throw the TaskFailedException
or we have to put an additional check for it in the catch
clause.
I see the issue but I'm not sure we want libraries making piece-meal decisions about what exception information to hide. I would prefer something systematic, e.g. having @sync
or @threads
propagate the inner exception only.
I would prefer something systematic, e.g. having
@sync
or@threads
propagate the inner exception only.
This is what we would prefer also :smile:.
Yes that would be even nicer for most cases. Though maybe being able to do that when you call fetch
would also be useful -- not sure.
bumping this issue. Just to show you a stack trace I got today:
julia> combine(d -> d.x == [1] ? d[1, [1, 2]] : d[1, [2, 1]], gdf)
ERROR: ArgumentError: return value must have the same column names for all groups (got (:x, :y) and [:y, :x])
Stacktrace:
[1] _combine(gd::GroupedDataFrame{DataFrame}, cs_norm::Vector{Any}, optional_transform::Vector{Bool}, copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:573
[2] _combine_prepare(gd::GroupedDataFrame{DataFrame}, cs::Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector{T} where T, Type, All, Between, Cols, InvertedIndex}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:66
[3] #combine#621
@ ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:674 [inlined]
[4] #combine#619
@ ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:663 [inlined]
[5] combine(f::Function, gd::GroupedDataFrame{DataFrame})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:660
[6] top-level scope
@ REPL[8]:1
caused by: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:317 [inlined]
[2] _combine(gd::GroupedDataFrame{DataFrame}, cs_norm::Vector{Any}, optional_transform::Vector{Bool}, copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:569
[3] _combine_prepare(gd::GroupedDataFrame{DataFrame}, cs::Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector{T} where T, Type, All, Between, Cols, InvertedIndex}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:66
[4] #combine#621
@ ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:674 [inlined]
[5] #combine#619
@ ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:663 [inlined]
[6] combine(f::Function, gd::GroupedDataFrame{DataFrame})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:660
[7] top-level scope
@ REPL[8]:1
nested task error: ArgumentError: return value must have the same column names for all groups (got (:x, :y) and [:y, :x])
Stacktrace:
[1] _combine_rows_with_first!(firstrow::DataFrameRow{DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}}, outcols::Tuple{Vector{Int64}, Vector{Int64}}, f::Function, gd::GroupedDataFrame{DataFrame}, incols::Nothing, colnames::Tuple{Symbol, Symbol}, firstmulticol::Val{true})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:267
[2] _combine_with_first(first::DataFrameRow{DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}}, f::Function, gd::GroupedDataFrame{DataFrame}, incols::Nothing, firstmulticol::Val{true}, idx_agg::Vector{Int64})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:63
[3] _combine_multicol(firstres::DataFrameRow{DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}}, fun::Function, gd::GroupedDataFrame{DataFrame}, incols::Nothing)
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:20
[4] _combine_process_callable(cs_i::Union{Function, Type}, optional_i::Bool, parentdf::DataFrame, gd::GroupedDataFrame{DataFrame}, seen_cols::Dict{Symbol, Tuple{Bool, Int64}}, trans_res::Vector{DataFrames.TransformationResult}, idx_agg::Base.RefValue{Union{Nothing, Vector{Int64}}})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:268
[5] macro expansion
@ ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:558 [inlined]
[6] (::DataFrames.var"#609#615"{GroupedDataFrame{DataFrame}, Bool, Bool, DataFrame, Dict{Symbol, Tuple{Bool, Int64}}, Vector{DataFrames.TransformationResult}, Base.RefValue{Union{Nothing, Vector{Int64}}}, Bool, var"#5#6"})()
@ DataFrames ./threadingconstructs.jl:169
caused by: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:317 [inlined]
[2] _combine_rows_with_first!(firstrow::DataFrameRow{DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}}, outcols::Tuple{Vector{Int64}, Vector{Int64}}, f::Function, gd::GroupedDataFrame{DataFrame}, incols::Nothing, colnames::Tuple{Symbol, Symbol}, firstmulticol::Val{true})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:265
[3] _combine_with_first(first::DataFrameRow{DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}}, f::Function, gd::GroupedDataFrame{DataFrame}, incols::Nothing, firstmulticol::Val{true}, idx_agg::Vector{Int64})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:63
[4] _combine_multicol(firstres::DataFrameRow{DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}}, fun::Function, gd::GroupedDataFrame{DataFrame}, incols::Nothing)
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:20
[5] _combine_process_callable(cs_i::Union{Function, Type}, optional_i::Bool, parentdf::DataFrame, gd::GroupedDataFrame{DataFrame}, seen_cols::Dict{Symbol, Tuple{Bool, Int64}}, trans_res::Vector{DataFrames.TransformationResult}, idx_agg::Base.RefValue{Union{Nothing, Vector{Int64}}})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:268
[6] macro expansion
@ ~/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:558 [inlined]
[7] (::DataFrames.var"#609#615"{GroupedDataFrame{DataFrame}, Bool, Bool, DataFrame, Dict{Symbol, Tuple{Bool, Int64}}, Vector{DataFrames.TransformationResult}, Base.RefValue{Union{Nothing, Vector{Int64}}}, Bool, var"#5#6"})()
@ DataFrames ./threadingconstructs.jl:169
nested task error: ArgumentError: return value must have the same column names for all groups (got (:x, :y) and [:y, :x])
Stacktrace:
[1] fill_row!(row::DataFrameRow{DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}}, outcols::Tuple{Vector{Int64}, Vector{Int64}}, i::Int64, colstart::Int64, colnames::Tuple{Symbol, Symbol})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:84
[2] _combine_rows_with_first_task!(tid::Int64, rowstart::Int64, rowend::Int64, rownext::Int64, outcols::Tuple{Vector{Int64}, Vector{Int64}}, outcolsref::Base.RefValue{Tuple{Vararg{AbstractVector{T} where T, var"#s280"}} where var"#s280"}, type_widened::Vector{Bool}, widen_type_lock::ReentrantLock, f::var"#5#6", gd::GroupedDataFrame{DataFrame}, starts::Vector{Int64}, ends::Vector{Int64}, incols::Nothing, colnames::Tuple{Symbol, Symbol}, firstmulticol::Val{true})
@ DataFrames ~/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:122
[3] (::DataFrames.var"#664#665"{var"#5#6", GroupedDataFrame{DataFrame}, Nothing, Tuple{Symbol, Symbol}, Val{true}, Vector{Bool}, Base.RefValue{Tuple{Vararg{AbstractVector{T} where T, var"#s280"}} where var"#s280"}, ReentrantLock, Vector{Int64}, Vector{Int64}, UnitRange{Int64}, Int64})()
@ DataFrames ./threadingconstructs.jl:169
Have there been any developments to this? Thanks
When an exception happens in a task, a task
TaskFailedException
is thrown. But packages that accept a user-provided function and call it in a task (like DataFrames) generally don't want to throw this exception type, but rethrow the original exception that happened in user code instead. Otherwise, changing the implementation (e.g. adding multithreading support) would change the user-visible behavior and break the API.Currently there doesn't seem to be a good way to do this. The package ExceptionUnwrapping.jl has even been created to work around that.
The best mitigation I could find is to use
try... catch
to catch theTaskFailedException
, and then callthrow
on the task's exception. This throws the original exception type, which allows preserving the API. But the backtrace refers to the place where the exception was rethrown rather than to the original place where the error happened, and users have to look at the third nested exception to see the most useful backtrace.Would there be a way to preserve the original backtrace? If not, wouldn't it be useful to allow this kind of pattern?