Open xuyxu opened 3 years ago
I will work on the #4 regressor task
I will work on the #4 regressor task
That would be really nice @tczhao! Adding the regressor requires many efforts, can you open a draft pull request and upload what you have done there? I am willing to take part in the development on this feature request and have some deeper discussions there.
In addition, here are some things that may be helpful to you:
RandomForestRegressor
and ExtraTreeRegressor
from Scikit-Learn first. This version only includes the reduced version on classification trees. I am willing to optimize that for regression trees after we have a quick verification on the effectiveness on regression.I'm working on #13
I will work on the #4 regressor task
That would be really nice @tczhao! Adding the regressor requires many efforts, can you open a draft pull request and upload what you have done there? I am willing to take part in the development on this feature request and have some deeper discussions there.
In addition, here are some things that may be helpful to you:
- For regression, the augmented features are out-of-bag predicted values from the cascade layer, which is unbounded features (In contrast, the augmented features for classification are bounded, i.e., the class vectors). This poses some problems if we want to use binning for acceleration, because the unbounded feature values after binning will be very sensitive to the boundary values.
- Use the
RandomForestRegressor
andExtraTreeRegressor
from Scikit-Learn first. This version only includes the reduced version on classification trees. I am willing to optimize that for regression trees after we have a quick verification on the effectiveness on regression.
Thanks, will have a draft ready in 2 days
maybe we can skip the Build wheels for Python 2.7
since python 2.7 is no longer maintained since 2020-01
maybe we can skip the
Build wheels for Python 2.7
since python 2.7 is no longer maintained since 2020-01
Wheels for Python 2.7 is not included in the CI on build wheels, I have created an individual branch for people of interests ;-)
EDIT: This is actually a feature request from several users in the industrial community, who told me that ver2.7 is still the most frequently used python version in their environment.
Hi,
Thanks @tczhao for the hard work!
Just would like to understand that if it would be sufficient to supply a custom loss by predictor_kwargs, (in other words, is there any other part in the CascadeForestRegressor using MSE as default?).
Thanks David
Hi,
Thanks @tczhao for the hard work!
Just would like to understand that if it would be sufficient to supply a custom loss by predictor_kwargs, (in other words, is there any other part in the CascadeForestRegressor using MSE as default?).
Thanks David
I think it is relatively easy to add the Mean Absolute Error (MAE), which is also available in Scikit-Learn. For custom loss functions, a new splitting criterion should be implemented for decision trees.
Maybe we can add another parameter to CascadeForestClassifier
and CascadeForestRegression
(e.g., criterion
), which specifies the splitting criterion for decision trees in the model.
I will work on the package for Mac-OS (#6, #32)
I will work on the package for Mac-OS (#6, #32)
Thanks ;-). You may find the documentation on cibuildwheel helpful when working on the CI: build-wheels.
Hi @xuyxu , I found that in the current master branch, input y value will be checked by "deepforest.cascade._check_target_values". But when I input a sequence of integers as y value, it will be defined as "multiclass" instead of "continuous". In my point of view, y value in regression problem can be float number or integer number. It may cause big error in the future. The images is the example from sklearn.utils.multiclass function type_of_target.
Hi @chendingyan, I agree with you on this point, the current check may be too strict. Any idea on how to improve this?
Hi @xuyxu ,if you use "type_of_target" to check for input y values, I might add multiclass and multiclass-multioutput for univariate and multivariate regression, and also check the value in numpy array is numeric.
Hi @xuyxu ,if you use "type_of_target" to check for input y values, I might add multiclass and multiclass-multioutput for univariate and multivariate regression, and also check the value in numpy array is numeric.
That's a nice idea, and this should be easy to implement. I will appreciate it very much if you could contribute a PR for this enhancement ;-)
Hi @xuyxu ,if you use "type_of_target" to check for input y values, I might add multiclass and multiclass-multioutput for univariate and multivariate regression, and also check the value in numpy array is numeric.
That's a nice idea, and this should be easy to implement. I will appreciate it very much if you could contribute a PR for this enhancement ;-)
Submit a PR~
Hi @xuyxu , can you help me check my pr? How can I pass the code quality check?
Thanks for the PR @chendingyan, I will fix the code quality problem.
This issue collects all features requests. Any one is welcomed to work on issues listed below, and do not forget to include your contributions and name in the
CHANGELOG.rst
.If you want to work on a requested feature, please re-open the linked issue, and leave a comment below to let us know that you want to work on it.
New features
CascadeForestRegressor
class for regression problem (#4)export_graphviz
method on visualizing decision trees in deep forest (#12)CascadeForestSurvAnalyzer
class for survival analysis (#71)Python package
New language wrappers:
Fix