RedHatInsights / aiops-insights-clustering

Clustering of systems
GNU General Public License v3.0
1 stars 14 forks source link

[WIP] integrating stability score code #10

Closed ch3njust1n closed 5 years ago

ch3njust1n commented 5 years ago

Still working on this. Please make suggestions for how I should integrate Michael's code.

durandom commented 5 years ago

And don't over-engineer 😄 let's start with hard coded defaults

durandom commented 5 years ago

for reference: https://gitlab.cee.redhat.com/mcliffor/insights_clustering/blob/master/stability_score.py

durandom commented 5 years ago

@ch3njust1n and please follow along the doc in https://github.com/AICoE/experiment-tracking and https://github.com/AICoE/experiment-tracking-template and create PRs in those repos as well if you find inconsistencies or a lack of documentation. If we do the same for the next model it should be easier for the people to come

ch3njust1n commented 5 years ago
* add branch out to [main](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/app.py#L32) depending on the presence of [MLFLOW_TRACKING_URI](https://github.com/AICoE/experiment-tracking-template/blob/master/app.py#L62) env variable

Could you clarify what you mean by add branch out to main. Not really sure what that means. Do you mean place an if statement to check if the MLFLOW_TRACKING_URI is set e.g. if len(MLFLOW_TRACKING_URI) == 0

* create a new module called `metric_tracking` to host all the code

Do you mean create a subdirectory or just a new file? Michael already has a file called stability_score.py that computes the cluster metrics. Or do you mean create a new file metric_tracking.py that does something else?

* use the [local](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/storage/local.py) storage module to test the code and later also make it work with the `ceph` module

What's the ceph module?

  * you might need to make adjustments to the storage module to support the use case

Could you clarify what changes may need to be made? I'm still not very familiar with the overall system and code base.

* run the [clustering](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/clustering/__init__.py#L4) twice, I think if using the storage module it should re-use the cached clusters.

* compute the `stableness` and report it to the mlserver

I thought we concluded that stableness does not make sense in this context. However, Michael did contrive a metric to measure consistency across runs. Although, this doesn't really mean much since we would like to see that machines move out of their previous clusters because they should be fixed over time.

* built the container image locally with `s2i` as described in the README

And don't over-engineer smile let's start with hard coded defaults

ch3njust1n commented 5 years ago

Also should the environment variable be called MLFLOW_TRACKING_URI or MLFLOW_TRACKING_UI? Michael named it the latter.

durandom commented 5 years ago
* add branch out to [main](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/app.py#L32) depending on the presence of [MLFLOW_TRACKING_URI](https://github.com/AICoE/experiment-tracking-template/blob/master/app.py#L62) env variable

Could you clarify what you mean by add branch out to main. Not really sure what that means. Do you mean place an if statement to check if the MLFLOW_TRACKING_URI is set e.g. if len(MLFLOW_TRACKING_URI) == 0

👍 Let's start with something simple as this. We can always refine the logic. Another common approach is laid out in the storage module. Not sure which is more python idiomatic.

* create a new module called `metric_tracking` to host all the code

Do you mean create a subdirectory or just a new file? Michael already has a file called stability_score.py that computes the cluster metrics. Or do you mean create a new file metric_tracking.py that does something else?

You can call it different than metric_tracking. Naming is hard. But use a directory containing the new module for organizational purposes.

* use the [local](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/storage/local.py) storage module to test the code and later also make it work with the `ceph` module

What's the ceph module?

It's in the storage module.

  * you might need to make adjustments to the storage module to support the use case

Could you clarify what changes may need to be made? I'm still not very familiar with the overall system and code base.

I don't know either. You'll find out while exploring the code. It's not very much.

* run the [clustering](https://github.com/RedHatInsights/aicoe-insights-clustering/blob/master/clustering/__init__.py#L4) twice, I think if using the storage module it should re-use the cached clusters.

* compute the `stableness` and report it to the mlserver

I thought we concluded that stableness does not make sense in this context. However, Michael did contrive a metric to measure consistency across runs. Although, this doesn't really mean much since we would like to see that machines move out of their previous clusters because they should be fixed over time.

For the tag suggestion use case, we'd want stableness.

* built the container image locally with `s2i` as described in the README

And don't over-engineer smile let's start with hard coded defaults

durandom commented 5 years ago

Also should the environment variable be called MLFLOW_TRACKING_URI or MLFLOW_TRACKING_UI? Michael named it the latter.

It's MLFLOW_TRACKING_URI

ch3njust1n commented 5 years ago

For the MLFLOW_TRACKING_URI check, what should be call or executed if true? Still not sure what you meant.

durandom commented 5 years ago

For the MLFLOW_TRACKING_URI check, what should be call or executed if true? Still not sure what you meant.

If the environment variable is present, then the running container or app should run the experiment tracking branch and report the parameters and metrics to the mlflow server

ch3njust1n commented 5 years ago

Ok. I just added the conditional branch to execute Michael's clustering code and created a new directory to contain the metric tracking code. Committed the code to my local dev branch here: https://github.com/ch3njust1n/aicoe-insights-clustering/tree/dev

ch3njust1n commented 5 years ago

use the local storage module to test the code and later also make it work with the ceph module you might need to make adjustments to the storage module to support the use case

What do you mean by make it work with ceph? What should be stored in ceph? Should I stored the centroid or the stability metric? What else should be stored? By use case, you mean the clustering code, correct?

durandom commented 5 years ago

Committed the code to my local dev branch here: https://github.com/ch3njust1n/aicoe-insights-clustering/tree/dev

Please reset this PR to be made off of that branch, then we'll see the diff here

durandom commented 5 years ago

use the local storage module to test the code and later also make it work with the ceph module you might need to make adjustments to the storage module to support the use case

What do you mean by make it work with ceph? What should be stored in ceph? Should I stored the centroid or the stability metric? What else should be stored? By use case, you mean the clustering code, correct?

the metric is stored in mlflow. The centroid is not stored at all. The cluster output is stored on ceph. And the input is read from ceph. This is what we use in the datahub environment

ch3njust1n commented 5 years ago

Sorry, how do I reset the PR? I found these instructions, but I don't see that option on my fork. I instead created a new PR.

durandom commented 5 years ago

Sorry, how do I reset the PR? I found these instructions, but I don't see that option on my fork. I instead created a new PR.

https://help.github.com/articles/changing-the-base-branch-of-a-pull-request/

ch3njust1n commented 5 years ago

I'm only giving the master branch in the base branch drop-down menu. I tried searching with the commit hash, but does not work.

durandom commented 5 years ago

closing in favor of https://github.com/RedHatInsights/aicoe-insights-clustering/pull/12