JuliaAI / DecisionTree.jl

Julia implementation of Decision Tree (CART) and Random Forest algorithms
Other
356 stars 102 forks source link

Allow features to be `AbstractVector` (in addition to `AbstractMatrix`) #150

Open sanderbboisen opened 2 years ago

sanderbboisen commented 2 years ago

I was playing around with DecisionTree.jl for a potential project and found that I could not produce a regression tree from a single feature. I have made changes to allow features to be a vector in the build_tree functions.

I have tested the changes using RunTest.jl with Julia 1.6.4 .

ablaom commented 2 years ago

I don't see any harm in this enhancement, apart from a mild complication in the codebase.

Anyone else have objections?

@sanderbboisen Would you be willing to add a test of native and ScikitLearn API with a vector X ?

sanderbboisen commented 2 years ago

I don't see any harm in this enhancement, apart from a mild complication in the codebase.

Anyone else have objections?

@sanderbboisen Would you be willing to add a test of native and ScikitLearn API with a vector X ?

Sure, I will get on it. Incoming update.

codecov-commenter commented 2 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (dev@17cb46d). Click here to learn what that means. The diff coverage is n/a.

@@          Coverage Diff           @@
##             dev     #150   +/-   ##
======================================
  Coverage       ?   88.84%           
======================================
  Files          ?        9           
  Lines          ?      986           
  Branches       ?        0           
======================================
  Hits           ?      876           
  Misses         ?      110           
  Partials       ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 17cb46d...070e86e. Read the comment docs.

sanderbboisen commented 2 years ago

I ran into an issue as several functions like apply_something were overloaded so that any Vector inputs worked differently. These are essentially only called by theapply_something functions that users are supposed to call. I have taken the path of least modification, and named them _apply_something. This way users only have access to the functions they are intended to use, and using Vectors as inputs work.

I have added tests to tests/scikitlearn.jl and everything seems to work.

I am sorry for the confusing commit structure, things got a little messy at some point.

I think there is a general issue with the way that the module is constructed, and I have been refactoring it in a separate branch on my fork. For one, the structure does not allow multiple usage of utils.jl and there is generally some issues with code repetition iirc. I will open another PR on this when I am done refactoring.

ablaom commented 2 years ago

I am happy with this now.

I suspect this smaller PR may create a few merge conflicts for the more complicated #166, so let's wait for:

before merging this one.

@sanderbboisen Thanks for your contribution and patience.

ablaom commented 2 years ago

@sanderbboisen Would you now be happy to rebase your PR?