Closed du-phan closed 5 years ago
I think we are saying the same thing! usually won’t have a new_test_df as large as the original test_df but for learning the drift model we need to be of similar size. So we’d need to down sample the original test_df indeed... that’s what I need in the original code, but maybe you’ve totally discarded it ?
And for a v2, we could bootstrap sample to get a more robust estimation of drift score.
Le 27 juil. 2019 à 12:18 +0200, Du Phan notifications@github.com, a écrit :
@du-phan commented on this pull request. In python-lib/dku_drifter/drifter.py:
+
- self.model_handler = self._get_model_handler()
- self.drift_clf = None
- self.train_X = None
- self.train_Y = None
- self.test_X = None
- self.test_Y = None
- def _get_model_handler(self):
- my_data_dir = os.environ['DIP_HOME']
- saved_model_version_id = get_saved_model_version_id(self.model_id)
- model_handler = get_model_info_handler(saved_model_version_id, my_data_dir)
- return model_handler
- def concatenate_new_and_original_data(self): One option is to force to sample original_test_df as many rows as new_test_df This seems like a very strong constraint to me, and in practice I don't think we can have a new_test_df as big as original_test_df. We can implement a downsampling mechanism in which we downsample the test set that has more rows. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Ah I just saw that, I missed it the first time I read your codes
New version that takes into account the feedbacks of Joachim and Léo.
A new object, ModelAccessor
is added to decouple the logic between dku model_handler
and the DriftAnalyzer
.
The new API is as follow:
dataiku.use_plugin_libs('model-drift')
from dku_drifter import DriftAnalyzer, ModelAccessor
from commons import get_model_handler
model_id = '5HExUjQ1'
test_set = 'unlabeled_customers_within_segments_prepared'
new_test_df = dataiku.Dataset(test_set).get_dataframe()
model = dataiku.Model(model_id)
model_handler = get_model_handler(model)
model_accessor = ModelAccessor(model_handler)
drifter = DriftAnalyzer(model_accessor)
drift_features, drift_clf = drifter.train_drift_model(new_test_df)
drift_metrics = drifter.generate_drift_metrics(new_test_df, drift_features, drift_clf)
Backend implementation with 3 components:
Preprocessor
object that mimics doctor's behaviour.DriftAnalyzer
object that train a drift model and return a list of metrics:Exemple: