JuliaAI / DecisionTree.jl

Julia implementation of Decision Tree (CART) and Random Forest algorithms
Other
342 stars 101 forks source link

Multithreaded support for apply_forest_proba (Issue #209) #210

Open salbert83 opened 1 year ago

salbert83 commented 1 year ago

I think regression uses the functions in ../classification/main.jl for applying forests to a set of features, so no new development required for this.

Fixes #209

codecov-commenter commented 1 year ago

Codecov Report

Merging #210 (686a44b) into dev (835f3cd) will increase coverage by 0.03%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #210      +/-   ##
==========================================
+ Coverage   87.99%   88.02%   +0.03%     
==========================================
  Files          10       10              
  Lines        1249     1253       +4     
==========================================
+ Hits         1099     1103       +4     
  Misses        150      150              
Impacted Files Coverage Δ
src/classification/main.jl 96.16% <100.00%> (+0.05%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

ablaom commented 1 year ago

@salbert83 Would you have some time soon to respond to the review?

ablaom commented 1 year ago

@rikhuijzer I'm not getting a response here. Are you willing and able to fishish this?

rikhuijzer commented 1 year ago

It looks like this would result in a nested @threads call. One time in stack_function_results and one time inside the row_fun that is passed into stack_function_results. Nested @threads should be possible with the :dynamic scheduler, which appears to have only been added in Julia 1.8 (https://github.com/JuliaLang/julia/blob/master/HISTORY.md). Also, I don't know how to benchmark whether adding multithreading actually saves time or not.

So, let's leave this open until the lower bound is set to Julia 1.8 or until someone who really needs this implements and shows benchmarks?

ablaom commented 1 year ago

Thanks @rikhuijzer for looking into this.

It looks like this would result in a nested @threads call.

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

I also notice that the existing implementation is buy-in (use_multithreading=false is the default) whereas the present addition is buy-out.

rikhuijzer commented 1 year ago

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

Maybe that explains https://github.com/JuliaAI/DecisionTree.jl/pull/188#issuecomment-1202086361. I'm afraid, I don't know and also I never need multithreading so I'm not the right person to ask unfortunately.

Maybe figuring out the right multithreading for this package is something you would like, @ExpandingMan?

ablaom commented 1 year ago

@OkonSamuel Do you see obvious issues with the way multithreading is currently implemented in prediction? It's here: https://github.com/JuliaAI/DecisionTree.jl/blob/f57a15633f5aadadfc408a8d5e42836e1f011c3f/src/classification/main.jl#L468

ablaom commented 1 year ago

@rikhuijzer I don't believe nested multithreading is an issue. This has been tested before in MLJTuning where optimization multithreading has within it resampling multithreading. My interpretation of the 1.9 changes cited is only that nested threading will typically be more efficient with new default settings for the scheduler.

I don't see anything obvious wrong about the proposed implementation (or the existing one), provided user must buy-in, but will wait for the pinged experts to hopefully weigh in.