Closed ch3njust1n closed 5 years ago
metric_tracking
to host all the codeceph
module
stableness
and report it to the mlservers2i
as described in the READMEAnd don't over-engineer 😄 let's start with hard coded defaults
@ch3njust1n and please follow along the doc in https://github.com/AICoE/experiment-tracking and https://github.com/AICoE/experiment-tracking-template and create PRs in those repos as well if you find inconsistencies or a lack of documentation. If we do the same for the next model it should be easier for the people to come
* add branch out to [main](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/app.py#L32) depending on the presence of [MLFLOW_TRACKING_URI](https://github.com/AICoE/experiment-tracking-template/blob/master/app.py#L62) env variable
Could you clarify what you mean by add branch out to main. Not really sure what that means. Do you mean place an if statement to check if the MLFLOW_TRACKING_URI is set e.g. if len(MLFLOW_TRACKING_URI) == 0
* create a new module called `metric_tracking` to host all the code
Do you mean create a subdirectory or just a new file? Michael already has a file called
stability_score.py
that computes the cluster metrics. Or do you mean create a new filemetric_tracking.py
that does something else?* use the [local](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/storage/local.py) storage module to test the code and later also make it work with the `ceph` module
What's the
ceph
module?* you might need to make adjustments to the storage module to support the use case
Could you clarify what changes may need to be made? I'm still not very familiar with the overall system and code base.
* run the [clustering](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/clustering/__init__.py#L4) twice, I think if using the storage module it should re-use the cached clusters. * compute the `stableness` and report it to the mlserver
I thought we concluded that stableness does not make sense in this context. However, Michael did contrive a metric to measure consistency across runs. Although, this doesn't really mean much since we would like to see that machines move out of their previous clusters because they should be fixed over time.
* built the container image locally with `s2i` as described in the README
And don't over-engineer smile let's start with hard coded defaults
Also should the environment variable be called MLFLOW_TRACKING_URI
or MLFLOW_TRACKING_UI
? Michael named it the latter.
* add branch out to [main](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/app.py#L32) depending on the presence of [MLFLOW_TRACKING_URI](https://github.com/AICoE/experiment-tracking-template/blob/master/app.py#L62) env variable
Could you clarify what you mean by add branch out to main. Not really sure what that means. Do you mean place an if statement to check if the MLFLOW_TRACKING_URI is set e.g.
if len(MLFLOW_TRACKING_URI) == 0
👍 Let's start with something simple as this. We can always refine the logic. Another common approach is laid out in the storage module. Not sure which is more python idiomatic.
* create a new module called `metric_tracking` to host all the code
Do you mean create a subdirectory or just a new file? Michael already has a file called
stability_score.py
that computes the cluster metrics. Or do you mean create a new filemetric_tracking.py
that does something else?
You can call it different than metric_tracking
. Naming is hard. But use a directory containing the new module for organizational purposes.
* use the [local](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/storage/local.py) storage module to test the code and later also make it work with the `ceph` module
What's the
ceph
module?
It's in the storage module.
* you might need to make adjustments to the storage module to support the use case
Could you clarify what changes may need to be made? I'm still not very familiar with the overall system and code base.
I don't know either. You'll find out while exploring the code. It's not very much.
* run the [clustering](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/clustering/__init__.py#L4) twice, I think if using the storage module it should re-use the cached clusters. * compute the `stableness` and report it to the mlserver
I thought we concluded that stableness does not make sense in this context. However, Michael did contrive a metric to measure consistency across runs. Although, this doesn't really mean much since we would like to see that machines move out of their previous clusters because they should be fixed over time.
For the tag suggestion use case, we'd want stableness.
* built the container image locally with `s2i` as described in the README
And don't over-engineer smile let's start with hard coded defaults
Also should the environment variable be called
MLFLOW_TRACKING_URI
orMLFLOW_TRACKING_UI
? Michael named it the latter.
It's MLFLOW_TRACKING_URI
For the MLFLOW_TRACKING_URI
check, what should be call or executed if true
? Still not sure what you meant.
For the
MLFLOW_TRACKING_URI
check, what should be call or executed iftrue
? Still not sure what you meant.
If the environment variable is present, then the running container or app should run the experiment tracking branch and report the parameters and metrics to the mlflow server
Ok. I just added the conditional branch to execute Michael's clustering code and created a new directory to contain the metric tracking code. Committed the code to my local dev branch here: https://github.com/ch3njust1n/aicoe-insights-clustering/tree/dev
use the local storage module to test the code and later also make it work with the ceph module you might need to make adjustments to the storage module to support the use case
What do you mean by make it work with ceph? What should be stored in ceph? Should I stored the centroid or the stability metric? What else should be stored? By use case, you mean the clustering code, correct?
Committed the code to my local dev branch here: https://github.com/ch3njust1n/aicoe-insights-clustering/tree/dev
Please reset this PR to be made off of that branch, then we'll see the diff here
use the local storage module to test the code and later also make it work with the ceph module you might need to make adjustments to the storage module to support the use case
What do you mean by make it work with ceph? What should be stored in ceph? Should I stored the centroid or the stability metric? What else should be stored? By use case, you mean the clustering code, correct?
the metric is stored in mlflow. The centroid is not stored at all. The cluster output is stored on ceph. And the input is read from ceph. This is what we use in the datahub environment
Sorry, how do I reset the PR? I found these instructions, but I don't see that option on my fork. I instead created a new PR.
Sorry, how do I reset the PR? I found these instructions, but I don't see that option on my fork. I instead created a new PR.
https://help.github.com/articles/changing-the-base-branch-of-a-pull-request/
I'm only giving the master branch in the base branch drop-down menu. I tried searching with the commit hash, but does not work.
closing in favor of https://github.com/RedHatInsights/aicoe-insights-clustering/pull/12
Still working on this. Please make suggestions for how I should integrate Michael's code.