RedHatInsights / aiops-insights-clustering

Clustering of systems
GNU General Public License v3.0
1 stars 14 forks source link

Cluster200 #20

Closed TreeinRandomForest closed 5 years ago

TreeinRandomForest commented 5 years ago

Adding framework for training and predicting for clustering service

durandom commented 5 years ago

off topic: @tumido @bronaghs should we move this repo over to the other aiops repos (manageiq org) as well? So that we can setup the same CI as for the other repos?

durandom commented 5 years ago

@TreeinRandomForest I think as a good practice you should prefix the PR title with [WIP] if the PR is not ready to merge. And maybe with a cecklist of missing things in the description.

I guess you'll still want to integrate into the current persistence workflow? I.e. implement the cluster method?

tumido commented 5 years ago

@durandom We can setup the pipeline over here without any problem - the buildfactory is org/RedHatInsights based. It should be fine to leave it here. I'm already preparing the templates. Once we're ready we can just add webhooks to this repo (with your help) and let it integrate as continuously as it gets...

Moving it over to the ManageIQ org is also a way, I'll just wanted to state that both are possible and that really more of a cosmetic/political change. :stuck_out_tongue_winking_eye:

durandom commented 5 years ago

I would keep all AIOps repos in the same org. Less confusion. @bronaghs I guess this is your call?

bronaghs commented 5 years ago

I would like this repo to exist alongside the other AI Ops repos, it will be more seamless that way.

durandom commented 5 years ago

@tumido @bronaghs ok, how would we proceed with this? You create another repo in MIQ and we push all commits over there?

TreeinRandomForest commented 5 years ago

folks, this is now ready to do the first round of integration. @tumido, what would the next step be? I am sure there'll be some API changes on our end to consistently dock with what you have.

As an example, test.py showcases the full workflow:

first argument = train_flag (1 = run training, 0 = ignore training)

second argument = inference_flag (1 = run inference, 0 = ignore inference)

python test.py 1 1 #run everything

If we can setup some sort of CI, that would be great because I still need to clean up the internal APIs on my side and there'll be more commits.

tumido commented 5 years ago

@TreeinRandomForest sure! :wink:

I'll adjust the API in a follow up. Not a problem. What I do expect from the service is that we can be able to call it like this (simplified):

model = Inference(...)

@app.route('/inference')
def inference_endpoint():
    # Collect data for prediction
    input_data = request.get_json(force=True)

    # Run the prediction
    return model.predict(input_data['data'])

Would that be possible?

TreeinRandomForest commented 5 years ago

@tumido of course! :smile: I can wrap everything up in a single function call on the inference side. Would it be helpful if I had a config file that defined various parameters like:

  1. which columns are categorical and which are to be dropped
  2. where the pre-trained models are stored
  3. any other data required during the inference process

The parameters would be read from the config file and model.predict will take care of all the details of preprocessing, loading the models and making the predictions then.

The only pain-point is that one would have to ensure the same config file is used during training and inference and we would need to find some way of (hopefully) automatically enforcing that.