JuliaML / MLUtils.jl

Utilities and abstractions for Machine Learning tasks
MIT License
107 stars 22 forks source link

Change default parallel executor to `ThreadedEx` #143

Closed RomeoV closed 1 year ago

RomeoV commented 1 year ago

Closes #142

codecov-commenter commented 1 year ago

Codecov Report

Merging #143 (cc508f6) into main (ff2fcc1) will increase coverage by 0.21%. The diff coverage is 68.42%.

:exclamation: Current head cc508f6 differs from pull request most recent head 112315d. Consider uploading reports for the commit 112315d to get more accurate results

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #143      +/-   ##
==========================================
+ Coverage   88.28%   88.50%   +0.21%     
==========================================
  Files          15       15              
  Lines         589      600      +11     
==========================================
+ Hits          520      531      +11     
  Misses         69       69              
Impacted Files Coverage Δ
src/MLUtils.jl 100.00% <ø> (+100.00%) :arrow_up:
src/obstransform.jl 82.69% <57.14%> (-1.40%) :arrow_down:
src/parallel.jl 94.82% <100.00%> (ø)
src/utils.jl 90.27% <100.00%> (+0.20%) :arrow_up:

... and 1 file with indirect coverage changes

ToucheSir commented 1 year ago

We would need to make sure this doesn't reintroduce the problem https://github.com/JuliaML/MLUtils.jl/pull/80 was trying to address.

lorenzoh commented 1 year ago

As mentioned in #142, let's do this.

Before merging, I'd like to add the following comments:

In any case, some up-to-date benchmarks comparing the two would be nice, but seeing the mounting issues with TaskPoolEx, we can leave these points to a future PR.

RomeoV commented 1 year ago

Sorry, didn't realize this was waiting for my input.

  1. I removed the dependency on FoldsThreads.jl
  2. I haven't played around with the basesize, but the default choice of num_elements / num_threads seems very reasonable for largers datasets where each workload is pretty much equal.

    My experience was that not setting basesize leads to some threads some being starved in the end as the work is split up evenly between threads, but am not sure how this applies to ThreadedEx.

I haven't seen that personally. Perhaps for specific workloads one can come up with a more optimized basesize, but I don't think we can come up with a better default than what is currently the default.

RomeoV commented 1 year ago

Some checks related to printing fail on julia-nightly on Ubuntu. I'm going to go out on a limb and say that's due to something in Julia and not in the PR.

RomeoV commented 1 year ago

Done. However, I'm a bit confused why the git diff here on github shows going from version 0.4.0 to 0.4.3, even though the master branch is already on 0.4.2 (?). Anyways, I think it should be correct now.

darsnack commented 1 year ago

Okay now it is showing conflicts (presumably because you touched a file changed on main). Can you rebase?

RomeoV commented 1 year ago

Done. Thanks for bearing with me!

darsnack commented 1 year ago

Thank you!