exasol / bucketfs-python

BucketFS utilities for the Python programming language
https://exasol.github.io/bucketfs-python
MIT License
1 stars 1 forks source link

🐞 Uploading pickled model to BucketFS does not work #65

Closed Nicoretti closed 1 year ago

Nicoretti commented 1 year ago

Summary

Uploading a pickled ... model to BucketFS fails with an exception.

Reproducing the Issue

Product ML file

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create a dummy dataset with 10,000 rows
np.random.seed(42)
data = {'X1': np.random.rand(10000),
        'X2': np.random.rand(10000),
        'X3': np.random.rand(10000),
        'y': np.random.rand(10000)}
df = pd.DataFrame(data)

# Split the data into features (X) and target variable (y)
X = df[['X1', 'X2', 'X3']]
y = df['y']

# Create a linear regression model
model = LinearRegression().fit(X, y)

# Fit the model to the data
#model.

# Generate 2,000 rows for X_new
np.random.seed(42)
X_new = pd.DataFrame({'X1': np.random.rand(2000),
                      'X2': np.random.rand(2000),
                      'X3': np.random.rand(2000)})

# Predict on new data
y_pred = model.predict(X_new)

# Print the predicted values
print("Predicted values:")
print(y_pred)

# Calculate the mean squared error
y_pred_train = model.predict(X)
mse = mean_squared_error(y, y_pred_train)
print("Mean Squared Error:", mse)

Pickled model

import pickle
from sklearn.linear_model import LinearRegression

# Save the model to a file
filename = 'dummy_linear_regression_model.sav'
pickle.dump(model, open(filename, 'wb'))

print("Model saved successfully.")

Failing code

import io
from exasol.bucketfs import Service

URL = "http://localhost:2581"
CREDENTAILS = {"default": {"username": "w", "password": "BBiSzwGaD6X7zLcjfpcP0OdGA317JABg"}}

bucketfs = Service(URL, CREDENTAILS)
bucket = bucketfs["default"]

filename = 'dummy_linear_regression_model.sav'
loaded_model = pickle.load(open(filename, 'rb'))

# Upload bytes
data = loaded_model
bucket["dummy/dummy_linear_regression_model.sav"] = data

# Upload file like object
# file_like = io.BytesIO(loaded_model)
# bucket.upload("dummy/dummy_linear_regression_model.sav", file_like)

# bucket.upload("dummy/dummy_linear_regression_model.sav", loaded_model)

Expected Behavior

Uploading model is successful.

Actual Behavior

Uploading model fails with exception

Root Cause (optional)

unknown

Screenshots

image

Reported by: @exa-eswar

Nicoretti commented 1 year ago

Hi @exa-eswar,

I assume the problem is an "miss use" of the API, uploading in memory live object is not supported at the moment. I think what you actually want to do is the following:

import io
from exasol.bucketfs import (
    Service,
    MappedBucket,
)

URL = "http://localhost:2581"
CREDENTAILS = {"default": {"username": "w", "password": "BBiSzwGaD6X7zLcjfpcP0OdGA317JABg"}}

bucketfs = Service(URL, CREDENTAILS)
bucket = MappedBucket(bucketfs["default"])

filename = 'dummy_linear_regression_model.sav'
with open(filename, 'rb') as data:
    # Upload bytes
    bucket["dummy/dummy_linear_regression_model.sav"] = data

looking forward to get your feedback on this.

tkilias commented 1 year ago

I think, what we might want is something like the following

import pickle
from sklearn.linear_model import LinearRegression
model = LinearRegression()

...

URL = "http://localhost:2581"
CREDENTAILS = {"default": {"username": "w", "password": "BBiSzwGaD6X7zLcjfpcP0OdGA317JABg"}}

bucketfs = Service(URL, CREDENTAILS)
bucket = MappedBucket(bucketfs["default"])

bucket["dummy/dummy_linear_regression_model.sav"] = object_to_stream(model)

loaded_model = object_from_stream(bucket["dummy/dummy_linear_regression_model.sav"])
Nicoretti commented 1 year ago

@tkilias good point, but that rather is a feature than a bug ;). Giving it some more thought ,I am not 100% sure if this functionality should be part of bucketfs library. I think this is rather a functionality of this specific client code, pickling is a very specific use case.

exa-eswar commented 1 year ago

@Nicoretti , Thanks for feedback and fix. I can confirm that this is working well.

tkilias commented 1 year ago

@Nicoretti yeah, it is a feature and probably not directly related to buckets, but something which we probably need, but maybe not as part of this repo, but one on top of it.

Nicoretti commented 1 year ago

thx @exa-eswar for the feedback