Closed petrovicboban closed 11 months ago
@petrovicboban The issue was find_match()
wasn't identifying matching values for different np.int
and np.float64
types. I added a cast, but it's possible this could be handled in a cleaner manner. The specific test appears case to be passing now.
@edwardwliu thanks! I thought the issue was in calling function, because the test was failing when arr_b
was numpy array. In all other tests, it was list.
@edwardwliu
it seems that fit()
should not change parameters which are set during __init__()
, and that's exactly what we do.
Here is test failure: https://github.com/forestry-labs/Rforestry/actions/runs/4774656551/jobs/8488479086
I guess we should skip setting those parameters in __init__()
and set them in fit()
only, but I'm not sure about implications of that.
@petrovicboban I see, let's only use the following parameters in fit()
and remove them from __init__()
if overlapping:
feature_weights
deep_feature_weights
observation_weights
lin_feats
monotonic_constraints
groups
@edwardwliu it looks like we need to move these too:
if self.max_depth is None:
self.max_depth = round(nrow / 2) + 1
if self.interaction_depth is None:
self.interaction_depth = self.max_depth
if self.max_obs is None:
self.max_obs = y.size
because they depend on argument of fit()
I see, yes let's move those parameters to fit()
for now as well. Conceptually, the params max_depth
and interaction_depth
could be agnostic to the data provided in fit()
, but the implementation in this package requires this change.
@edwardwliu can you check current test failures? One is related to you recent change (groups) and one is from check_estimator()
@petrovicboban Both of these test cases should be passing be now. The estimator test was because we do not currently support a data type scipy.sparse.csc_matrix
. In addition, it may be useful to parameterize checks while I debug.
@edwardwliu this looks good for now. It fails only on estimator's pickle check. Probably because saved_forest_
attribute is expected during the process but is added only after call to translate_tree
. I've tried several solutions, but failed.
After we solve that, I need to improve __init__
parameters validation.
current status: ✅
Overall Coverage
Statements Covered Coverage Threshold Status 814 646 79% 60% 🟢 New Files
No new covered files...
Modified Files
File Coverage Status Python/random_forestry/forestry.py 79% 🟢 Python/random_forestry/preprocessing.py 72% 🟢 Python/random_forestry/validators/init.py 100% 🟢 Python/random_forestry/validators/base_validator.py 91% 🟢 Python/random_forestry/validators/fit_validator.py 87% 🟢 Python/random_forestry/validators/predict_validator.py 89% 🟢 TOTAL 86% 🟢 **updated for commit:
dc2031a
@petrovicboban Both of these test cases should be passing be now. The estimator test was because we do not currently support a data type
scipy.sparse.csc_matrix
. In addition, it may be useful to parameterize checks while I debug.
It's passing with:
def _more_tags(self):
return {
"_xfail_checks": {
"check_estimators_pickle": "To be fixed later",
"check_n_features_in": "To be fixed later",
"check_estimators_nan_inf": "To be fixed later",
"check_dtype_object": "To be fixed later",
},
}
New PR based on this one was created https://github.com/forestry-labs/Rforestry/pull/146
@edwardwliu can you figure out why test is failing here? I've tried to do it, but it goes deep into logic (categorical feature mapping).