bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
https://bentoml.com
Apache License 2.0
7.12k stars 792 forks source link

Multi-Model Support - Add documentation/example #56

Closed parano closed 5 years ago

kivo360 commented 5 years ago

Is this a thing? I'm very much interested in using BentoML to train for and predict for multiple related data sets at a time (same features, but different histories), yet I can't tell if this is here yet.

parano commented 5 years ago

hey @kivo360, yes BentoML already supports packaging multiple models together in one bundle and serving multiple models in one REST server(even though we haven't added an example yet), but BentoML doesn't manage how you train multiple models. This issue meant to add support and examples to make the workflow of packaging multiple models easier with BentoML.

I think for your use case, you can definitely produce multiple model artifacts from multiple jobs with a different data set, and use BentoML to bundle those artifacts together for serving/hosting.

Do you mind sharing a bit more about your use case here? It might be helpful for us to add more support for this type of use case in BentoML.

kivo360 commented 5 years ago

I'm creating a trading bot. I'm only doing a partial_fit on PassiveAggressiveRegressor for 4-10 stochastically generated bars. The idea is that I need to account for training on non-stationary data in a backtest to see how my bot will operate in live. The plan is to train every time I make a decision. That's because the market changes whenever I actually place in an order. I need to ensure I account for that within emulation (for an RL agent).

I was thinking of creating a hacky version of this idea using your code as reference, then throwing up a second repo. I can't store models quickly in backtest because storage is slow, so I was thinking of letting a daemon thread run in the background to periodically save what's in memory.

kivo360 commented 5 years ago

I wrote requirements for an online learning addition. Imagine what kind of clients you could get from hedgefunds and banks by including online learning into your platform.

Background Training and Saving

The objective of this document is to explain the requirements for a class that creates, trains and saves models quickly & dynamically. The end goal afterward is to be able to deploy the models generated into production almost immediately in an online learning setup.

Background Trainer

First, we'll show what a simple background trainer will look like for various functionality.

Creating models with query dicts

from bg_trainer import BackgroundTrainer
from sklearn.linear_model import PassiveAggressiveRegressor

def get_pa_models(query_list):
    assert isinstance(query_list, list)
    r_values = []
    for ql in query_list:
        # QL should be the query
        assert isinstance(ql, dict)
        qqq =PassiveAggressiveRegressor(max_iter=100, random_state=0, tol=1e-3)

        r_values.append((ql, qqq))

    return r_values

# Model_list is a temporary list of models. Only use it for test
model_list = get_pa_models(
[
    {
        "episode_id": "dhosdisldn", "coin": "BTC"
    }, 
    {
        "episode_id": "yqbysudjvdg", "coin": "BTC"
    }, 
    {
        "episode_id": "yqbysudjvdg", "coin": "ETH"
    }
]) 

bg = BackgroundTrainer()

for mtup in model_list:
    model_query = mtup[0]
    actual_model = mtup[1]
    bg.add_model(model_query, actual_model)

Dynamically train models

bg = BackgroundTrainer()

try:
    bg.get_model(model_query).partial_fit(X, y)
except ModelNotFoundException:
    print("Model was not found")

The dynamically trained models should save automatically according to the query parameters they originally had, so if we stopped the background process, then started it back up again, we'll be able to grab the same models according to the query_parameters. We'll know that the model saving and pruning is doing well when we do the following.

  1. We stop the program.
  2. We can use the same query dicts as before and load the old models into memory.

This is to train models that can have breaks when running the program and to move into production quickly as well.

The model should be saved from memory into storage periodically using a daemon thread. We locate models in memory using a base64 string.

TO BASE64 STRING

get_query_dict --> encode_queury_str --> encode_base_64 --> base_64_str

We'd save the model using the base64 String. When we look for a model, all we'll have to do is search by dict.

model = get_model({"episode_name": eid, "name": "sample_model_name"})

Prior to searching for the model in storage, we're going to look inside of an in-memory dict to determine if the model we're looking for does exist.

model_dict = {}
model_dict['baseencoded'] = sample_model

def get_model(model_loc_dict):
    # convert model_dict into a base64string
    base64_model_str = get_base_model(model_loc_dict)
    model = model_dict.get(base_64_model_str)

    # If the model is None, check to see if the model exist in storage

    if model is None:
        model = get_model_from_storage(model_loc_dict)
        return model
    return model

Background Process

We save to storage and prune unused models using a background thread. Look at logurus' log enqueue=True method to see how this works.


def run(self):
    while True:
        self.bg_save_all()
        self.prune_models()
kivo360 commented 5 years ago

I plan to create a separate project to do this for me. When done I'm down to share. I say it'll be cool to try integrating something like this into your project.

parano commented 5 years ago

@kivo360 definitely would love to see the solution you are working on. It is indeed something we would like to integrate into BentoML, we discussed internally on supporting dynamic loading of updated artifacts for use cases like this. Do ping me if you are interested in contributing your work to BentoML!

kivo360 commented 5 years ago

I just created something. It's in the repo funguauniverse. The link is here. Look at the examples folder inside of the repo. It's a server version of a PassiveAggressiveRegressor. It runs using the SimpleHTTPServer and ThreadMix. This makes the service a multi-threaded socket server. It does the exact functionality I just mentioned.

I used some example from ray to get an idea of what I wanted to do. I was running a sample service for 4 hours and periodically training new data, then returning the score and prediction. This means it should work to an extent in the real world.

kivo360 commented 5 years ago

You'll need to change the storage model to something that works within your systems.

yubozhao commented 5 years ago

oh nice!. I will check this repo out.