This PR adds Azure Blob Storage support to tensorizer. Both serialization and deserialization work, very similarly to the S3 support that already exists:
For serialization of models to Azure, it presently writes to a temporary file before moving on to uploading it to the Azure Blob Store. This is triggered by a GC cleanup or an explicit close().
For deserialization of models from Azure, it generates a SAS token to pass it to the CURLStreamFile class which does an ordinary HTTP pull.
Parameterization is done using the DefaultAzureCredential class. It picks up the credentials via various automated mechanisms. The one that was tested was EnvironmentCredential.
Usage is straightforward -- provide it with an azure:// uri in the form of azure://account/container/blob and it will take it from there.
import tensorizer.serialization as serialization
from transformers import AutoModelForCausalLM
import azure.core.exceptions
model = AutoModelForCausalLM.from_pretrained("eleutherai/gpt-neo-125m")
print("Model loaded.")
serializer = serialization.TensorSerializer(
"azure://test/data/gpt-neo-125m",
)
try:
serializer.write_module(model)
serializer.close()
print("Done serializing to Azure!")
except azure.core.exceptions.ResourceExistsError:
print("Resource already exists.")
deserialize = serialization.TensorDeserializer(
"azure://test/data/gpt-neo-125m",
verify_hash=True,
)
deserialize.load_into_module(model)
print("Model deserialized from Azure!")
This PR is not complete -- test cases still need to be written, which is vastly complicated by the lack of a library similar to AWS' moto3 for mocking up interfaces.
This PR adds Azure Blob Storage support to
tensorizer
. Both serialization and deserialization work, very similarly to the S3 support that already exists:close()
.CURLStreamFile
class which does an ordinary HTTP pull.Parameterization is done using the
DefaultAzureCredential
class. It picks up the credentials via various automated mechanisms. The one that was tested wasEnvironmentCredential
.Usage is straightforward -- provide it with an
azure://
uri in the form ofazure://account/container/blob
and it will take it from there.This PR is not complete -- test cases still need to be written, which is vastly complicated by the lack of a library similar to AWS'
moto3
for mocking up interfaces.