DLHub SDK contains a Python interface to the Data and Learning Hub for Science (DLHub). These interfaces include functions for quickly describing a model in the correct schema for DLHub, and discovering or using models that other scientists have published.
DLHub SDK is on PyPi, and can be installed using pip
pip install dlhub-sdk
The full documentation for dlhub_sdk
is avilable on Read the Docs
The following sections are short introductions to using the DLHub SDK.
Users interact with DLHub by submitting HTTP requests to a REST API. In an effort to make using this API simple, the DLHub SDK contains a client that provides a Python API to these requests and hides the tedious operations involved in making an HTTP call from Python.
To create the client, call
from dlhub_sdk.client import DLHubClient
client = DLHubClient()
The client makes it simple to find interesting machine learning models. For example, you can get all of the models on DLHub by
d = client.get_servables()
That command will return a Pandas DataFrame of models, which looks something like:
name | description | id | |
---|---|---|---|
0 | mnist_tiny_example | MNIST Digit Classifier with a small NN | 123 |
1 | mnist | CNN acheiving 99.25% on the MNIST test data | 111 |
2 | formation_energy | Predict the formation enthalpy of a material given its composition | 112 |
Once you get the name of a model, it can be run thorugh the client as well:
client.run('ryan_globusid/noop', inputs='my data')
As a simple example, we will show how to submit a machine learning model created based on the Iris Dataset. Full scripts for this example model are in /examples/iris.
For brevity, we will upload much less metadata about a model created using Scikit-Learn.
We simply load in a Scikit-Learn model from a pickle file, and then provide a minimal amount of information about it.
from dlhub_sdk.models.servables.sklearn import ScikitLearnModel
model_info = ScikitLearnModel.create_model('model.pkl', n_input_columns=len(data.columns) - 1,
classes=data['species'].unique())
# Describe the model
model_info.set_title("Example Scikit-Learn Model")
model_info.set_name("iris_svm")
model_info.set_domains(["biology"])
The SDK will inspect the pickle file to determine the type of the model and the version of scikit-learn that was used to create it.
{
"datacite": {
"creators": [],
"titles": [
{
"title": "Example Scikit-Learn Model"
}
],
"publisher": "DLHub",
"publicationYear": "2018",
"identifier": {
"identifier": "10.YET/UNASSIGNED",
"identifierType": "DOI"
},
"resourceType": {
"resourceTypeGeneral": "InteractiveResource"
}
},
"dlhub": {
"version": "0.1",
"domains": ["biology"],
"visible_to": [
"public"
],
"id": null,
"name": "iris_svm",
"files": {
"model": "model.pkl",
"other": []
},
"dependencies": {
"python": {
"scikit-learn": "0.19.1"
}
}
},
"servable": {
"methods": {
"run": {
"input": {
"type": "ndarray",
"description": "List of records to evaluate with model. Each record is a list of 4 variables.",
"shape": [
null,
4
],
"item_type": {
"type": "float"
}
},
"output": {
"type": "ndarray",
"description": "Probabilities for membership in each of 3 classes",
"shape": [
null,
3
],
"item_type": {
"type": "float"
}
},
"parameters": {},
"method_details": {
"method_name": "_predict_proba"
}
}
},
"shim": "sklearn.ScikitLearnServable",
"type": "Scikit-learn estimator",
"model_type": "SVC",
"model_summary": "SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,\n decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',\n max_iter=-1, probability=True, random_state=None, shrinking=True,\n tol=0.001, verbose=False)",
"options": {
"serialization_method": "pickle",
"is_classifier": true,
"classes": [
"setosa",
"versicolor",
"virginica"
]
}
}
}
At this point, we are ready to publish both the model and dataset on DLHub.
You can publish a model to DLHub by first reading in the metadata from file and then calling the client:
from dlhub_sdk.models import BaseMetadataModel
from dlhub_sdk.client import DLHubClient
import json
# Read the model description
with open('model.json') as fp:
model = BaseMetadataModel.from_dict(json.load(fp))
# Publish the model to DLHub
client = DLHubClient()
client.publish_servable(model)
print('Model published to DLHub. ID:', model.dlhub_id)
When you call this script, the DLHub client will assign your model a unique identifier and the model will soon be available for you to use via DLHub.
This material is based upon work supported by Laboratory Directed Research and Development (LDRD) funding from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357.