Azure / azure-storage-python

Microsoft Azure Storage Library for Python
https://azure-storage.readthedocs.io
MIT License
338 stars 240 forks source link

Unable to load serialized object persisted on Azure Blob storage #678

Open demihuman2020 opened 3 years ago

demihuman2020 commented 3 years ago

We serialize the 'pipeline' object and store it on Azure Blob Storage.

with open("feg_pipeline.pkl", 'wb') as f:
    dill.dump(pipeline, f, protocol=pickle.HIGHEST_PROTOCOL)
blob_service_client = BlobServiceClient.from_connection_string(conn_str=‘XXX’)
container_name = ‘YYY’
blob_service_client.create_container(container_name, public_access=PublicAccess.Container)
blob_client = blob_service_client.get_blob_client(
            container=container_name, blob='feg_pipeline.pkl')
blob_client.upload_blob('feg_pipeline.pkl')

Later in a different function we read this from Blob storage as follows -


blob = BlobClient(account_url='https://fhghjhjh’,
                 container_name=‘YYY’,
                   blob_name='feg_pipeline.pkl’,
                   credential=‘XXX’)
feg_from_blob = None
with open("feg_pipeline.pkl”, "wb") as f:
    data = blob.download_blob()
    data.readinto(f)
with open("feg_pipeline.pkl”, "rb") as f:
    feg_from_blob = dill.load(f)

For this we are getting

UnpicklingError: invalid load key, 'f'.

We have tried using -

Dill, Joblib, cPickle, CloudPickle and Pickle methods for serializing and deserializing, all of these gave keyerror while loading the object from the file downloaded from Blob.
Base64 encoding(while serializing) and decoding(while deserializing)..This gives a padding error while loading.

What is the best way to persist and reuse such objects in Azure?