JuliaML / MLUtils.jl

Utilities and abstractions for Machine Learning tasks
MIT License
107 stars 20 forks source link

MLUtils seems quite heavy #155

Open ablaom opened 1 year ago

ablaom commented 1 year ago

I am increasingly relying on the getobs/nobs interface in quite low-level packages I am working on. It's nice to be able to work generically with tables and arrays. But I only need this basic API and simple things like eachobs. I'm finding MLUtils.jl rather heavy for this purpose (46s precompile/load on julia 1.9).

Are there any plans for factoring out base functionality or moving stuff out to weak dependencies?

I see that StaticArrays constributes lot to load times. The dependency here is NNlib -> KernelAbstractions -> StaticArrays. What's in NNlib that's needed here? (Maybe KernelAbstractions only needs StaticArraysCore?)

julia> @time_imports using MLUtils
      1.1 ms  Statistics
      7.3 ms  ShowCases
      0.3 ms  Compat
      0.5 ms  Compat → CompatLinearAlgebraExt
      1.2 ms  ConstructionBase
     10.7 ms  InitialValues
      0.4 ms  Requires
      0.5 ms  DataValueInterfaces
      1.2 ms  DataAPI
      0.5 ms  IteratorInterfaceExtensions
      0.5 ms  TableTraits
     32.2 ms  Tables
     10.6 ms  MacroTools
     27.5 ms  ChainRulesCore
      0.9 ms  ZygoteRules
      3.7 ms  StaticArraysCore
     17.8 ms  Setfield
     17.0 ms  BangBang
      0.9 ms  ContextVariablesX
      0.5 ms  FLoopsBase
      1.1 ms  PrettyPrint
      0.5 ms  NameResolution
    126.0 ms  MLStyle
      3.0 ms  JuliaVariables
      0.4 ms  Adapt
      0.5 ms  ArgCheck
     14.1 ms  Baselet
      0.6 ms  CompositionsBase
      0.5 ms  DefineSingletons
      9.8 ms  MicroCollections
     14.6 ms  SplittablesBase
     34.1 ms  Transducers
      4.2 ms  FLoops
      1.1 ms  InverseFunctions
     18.8 ms  Accessors
     18.5 ms  FunctionWrappers
    235.6 ms  FoldsThreads 309.83% compilation time
     60.5 ms  DataStructures
      0.6 ms  SortingAlgorithms
      9.3 ms  Missings
      1.0 ms  DocStringExtensions
      4.7 ms  IrrationalConstants
      0.4 ms  LogExpFunctions
      0.6 ms  LogExpFunctions → LogExpFunctionsChainRulesCoreExt
      0.4 ms  LogExpFunctions → LogExpFunctionsInverseFunctionsExt
      0.4 ms  StatsAPI
     17.3 ms  StatsBase
      2.7 ms  SimpleTraits
      6.0 ms  UnsafeAtomics
     12.9 ms  Atomix
      2.2 ms  GPUArraysCore
     13.8 ms  Preferences
      0.4 ms  PrecompileTools
    435.4 ms  StaticArrays
      1.1 ms  ConstructionBase → ConstructionBaseStaticArraysExt
      0.5 ms  Adapt → AdaptStaticArraysExt
      0.5 ms  Accessors → AccessorsStaticArraysExt
      3.7 ms  CEnum
      0.4 ms  JLLWrappers
    242.0 ms  LLVMExtra_jll 98.67% compilation time (98% recompilation)
     42.7 ms  LLVM
      4.7 ms  UnsafeAtomicsLLVM
     27.9 ms  KernelAbstractions
     30.3 ms  NNlib 57.78% compilation time
      1.4 ms  DelimitedFiles
      7.0 ms  MLUtils
ToucheSir commented 1 year ago

NNlib is used in a couple of places in https://github.com/JuliaML/MLUtils.jl/blob/main/src/utils.jl, but I don't think those would be too difficult to change or vendor the functions used.

CarloLucibello commented 1 year ago

Yes, it would be nice to excise the NNlib dependency. Its functionality is used in

so we could move those functions to NNlib.

ablaom commented 10 months ago

Anyone have some time to revisit this?

ToucheSir commented 10 months ago

The biggest blocker is still what to use in place of NNlib.scatter for https://github.com/JuliaML/MLUtils.jl/blob/09c87f7097536384cea0a132aa0012679df18175/src/utils.jl#L201. Vendoring scatter won't help since it depends on KernelAbstractions.

It'd also be worth redoing the import timings since the JuliaFolds packages have changed ownership and received some bugfixes since this issue was originally opened.

ablaom commented 9 months ago

Related (duplication?): https://github.com/JuliaML/MLUtils.jl/issues/90