UBC-MDS / majacloud

DSCI 525 Group III - Web and Cloud Computing: Development of rainfall predictor via AWS cloud services.
MIT License
0 stars 1 forks source link

Milestone 4 checklist #40

Open jianructose opened 3 years ago

jianructose commented 3 years ago

Milestone 4: In this milestone, you will deploy the machine learning model you trained in milestone 3.

Milestone 4 checklist :

shoebillm commented 3 years ago
Screen Shot 2021-04-26 at 4 50 23 PM
shoebillm commented 3 years ago
from flask import Flask, request, jsonify
import joblib
import pandas as pd

app = Flask(__name__)

# 1. Load your model here
model = joblib.load("model.joblib")

# 2. Define a prediction function
def return_prediction(content):
    # format input_data here so that you can pass it to model.predict()

    test_data = pd.DataFrame(content).T

    return model.predict(test_data)[0]

# 3. Set up home page using basic html
@app.route("/")
def index():
    # feel free to customize this if you like
    return """
    <h1>Welcome to our rain prediction service</h1>
    To use this service, make a JSON post request to the /predict url with 5 climate model outputs.
    """

# 4. define a new route which will accept POST requests and return model predictions
@app.route('/predict', methods=['POST'])
def rainfall_prediction():
    content = request.json  # this extracts the JSON content we sent
    prediction = return_prediction(content)
    results = {"input": content["data"], "prediction": prediction}
  # return whatever data you wish, it can be just the prediction
                     # or it can be the prediction plus the input data, it's up to you
    return jsonify(results)
shoebillm commented 3 years ago

In Milestone 1, we compared different data loading methods and different file formats: the efficiency of observed_rainfall data loading seemed to be largely dependent on OS type and RAM specifications; dask method seemed to be a more efficient way to load the data as compared to pandas; in terms of the file formats, although arrow exchange is the fastest way to read due to its columnar format, it is less efficient when searching as compared to parquet or feather formats, which also use less storage space.

In Milestone 2, we set up EC2 instance ad S3 bucket using AWS service with more powerful server, for faster data processing and data sharing. In this way, multiple users can work in the same environment on the same project to make data wrangling more efficient.

In Milestone 3, we moved further to set up EMR in or EC2 instance and train the machine learning model based on observed_rainfall data. Once we obtained the trained model with the optimized hyperparameters, we saved the model on S3 for further use.

In Milestone 4, we built a simple web API to deploy our trained and saved machine learning model obtained from Milestone 3 and made it available online for other people to use as well via screen commands.