Open edwardwliu opened 1 year ago
I have no Data Science skills and experience, so I need more details here. It's very likely I would need serious DS training to get into it, and I can't be productive in this domain yet.
I see your concerns. I shared some additional context via Slack, but IIUC you don't have access or to email for the next two weeks, is that correct? This request stems from an exploration of sklearn compatibility by others that I've uploaded here. I think running check_estimator()
in the example then addressing pure syntax changes could be an appropriate starting point. Let me know if you run into specific issues or continue to have concerns.
I still have no idea of what's going on in that code and what the goal is. No idea what sklearn/ray is too, I have no exposure to ds frameworks at all. DS is not something I can jump in just like that, without any theoretical knowledge. I'm sorry but you'll have to reassign this task to someone else.
Thanks for sharing the context. I shared some details earlier over Slack, but realize you may not have access. Let's chat tomorrow about the best path forward!
Chatted about this, appreciate you taking the time to sync. Documenting some of the additional context discussed here.
The Python package was originally developed with the intention of natively following an interface called "scikit-learn", but deltas have been introduced over time. The interface defines some requirements for class initialization, function naming, and parameter. Adhering to this interface allows easy integration with many other Python libraries that can parallelize model training or improve performance. The fixes should be syntactical and be limited to appropriately extending class requirements. However, if any errors touch model logic, requires data science context, or if any questions arise at all, please tag me and I'll resolve those myself as soon as I can.
The starting step is to run and eventually pass the following check:
from random_forestry import RandomForest
from sklearn.utils.estimator_checks import check_estimator
check_estimator(RandomForest())
The checks can be found in estimator_checks.py() with details outlined in the documentation starting from this section and below. The bulk of the change should be the following:
get_params
and set_params
__init__()
that aligns with paramsnewdata
should be renamed X
Other changes may be class specific so as mentioned above, let's work closely on this if questions arise!
We're interested in compatibility with sklearn estimators (see instructions) for integration with other libraries like Ray. The goal would be to pass the
check_estimator()
check. A few specific items to respect requirements include: class__init__
parameters should be explicit and changing thenewdata
arg tox
, adhered by subsequent methods.