Open ZonghaoWang opened 3 years ago
@galeone
I read through your guide and figured out how to save a model after training in golang.
Note that I am using python TF 2.4.1 and github.com/tensorflow/tensorflow v2.0.3+incompatible. It may work on other TF versions and also with tfgo.
When I saved a model tensorflow python it had an additional two StatefulPartitionCall
outputs and a saver_filename
input. These are not visible using saved_model_cli
but you can see them by looping through .Operations()
on a loaded model in golang.
The first of the two extra outputs, when given a string value E.G. tf.NewTensor("save_dir/variables/variables")
for saver_filename will save the variables (I.E. variables.index, variables.data-00000-of-00001) in save_dir/variables
.
You can just then copy the original saved_model.pb into save_dir
, and when you load from that dir it will have the golang trained weights.
Thanks for your article, it pointed me in the right direction.
@CodingBeard thanks for reading the article and for the feedback!
I'm not aware of that method of getting the operation name, but your experience can help other readers. Would you be so kind to open a merge request to the blog repo (https://github.com/galeone/galeone.github.io/blob/master/_posts/2020-11-27-deploy-train-tesorflow-models-in-go-human-activity-recognition.md) adding some lines on how to find the node names if you can't find them using saved_model_cli
?
It will be a great addition!
I'm not using tfgo in my project, but with an older version of tensorflow's golang package it looks like this:
model, e := tf.LoadSavedModel("model_dir", []string{"serve"}, nil)
if e != nil {
panic(e.Error())
}
for _, operation := range model.Graph.Operations() {
fmt.Println(operation.Name())
}
It seems the save node is always the first hidden StatefulPartitionCall
after the ones visible in saved_model_cli
. So for example in your article where the learn
signature is StatefulPartitionCall
and the predict
signature is StatefulPartitionCall_1
then the one to call for saving the variables will be StatefulPartitionCall_2
The full example using python TF 2.4.1 and golang tensorflow/tensorflow/go v2.0.3 is as follows:
Python:
class GolangModel(tf.Module):
def __init__(self):
super().__init__()
bool_input = k.layers.Input(
shape=(3,),
name='bool_input',
dtype='float32',
batch_size=10
)
output = k.layers.Dense(
1
)(bool_input)
self.model = Model(bool_input, output)
self._global_step = tf.Variable(0, dtype=tf.int32, trainable=False)
self._optimizer = k.optimizers.Adam()
self._loss = k.losses.binary_crossentropy
@tf.function(
input_signature=[
tf.TensorSpec(shape=(None, 3), dtype=tf.float32),
tf.TensorSpec(shape=(None, 1), dtype=tf.float32),
]
)
def learn(self, data, labels):
self._global_step.assign_add(1)
with tf.GradientTape() as tape:
loss = self._loss(labels, self.model(data))
gradient = tape.gradient(loss, self.model.trainable_variables)
self._optimizer.apply_gradients(zip(gradient, self.model.trainable_variables))
return {"loss": loss}
@tf.function(input_signature=[tf.TensorSpec(shape=(None, 3), dtype=tf.float32)])
def predict(self, data):
prediction = self.model(data)
return {"prediction": prediction}
gm = GolangModel()
gm.learn(
tf.zeros([10, 3], dtype=tf.float32),
tf.zeros([10, 1], dtype=tf.float32),
)
gm.predict(tf.zeros((10, 3), dtype=tf.float32))
tf.saved_model.save(
gm,
"/data/models/gm",
signatures={
"learn": gm.learn,
"predict": gm.predict,
},
)
golang:
gm, e := tf.LoadSavedModel("/data/models/gm", []string{"serve"}, nil)
if e != nil {
errorHandler.Error(e)
return nil
}
boolInput, e := tf.NewTensor([][]float32{{0.5, 0.5, 0.5}, {0, 0, 0}})
result, e := gm.Session.Run(
map[tf.Output]*tf.Tensor{
gm.Graph.Operation("predict_data").Output(0): boolInput,
},
[]tf.Output{
gm.Graph.Operation("StatefulPartitionedCall_1").Output(0),
},
nil,
)
if e != nil {
errorHandler.Error(e)
return e
}
floatResults, ok := result[0].Value().([][]float32)
if !ok {
fmt.Println("No float results")
return nil
}
fmt.Println(floatResults)
trainData, e := tf.NewTensor([][]float32{
{0.5, 0.5, 0.5},
{0.5, 0.5, 0.5},
{0.5, 0.5, 0.5},
{0.5, 0.5, 0.5},
{0.5, 0.5, 0.5},
{0, 0, 0},
{0, 0, 0},
{0, 0, 0},
{0, 0, 0},
{0, 0, 0},
})
trainLabels, e := tf.NewTensor([][]float32{
{1},
{1},
{1},
{1},
{1},
{0},
{0},
{0},
{0},
{0},
})
for i := 0; i < 1000; i++ {
_, e := gm.Session.Run(
map[tf.Output]*tf.Tensor{
gm.Graph.Operation("learn_data").Output(0): trainData,
gm.Graph.Operation("learn_labels").Output(0): trainLabels,
},
[]tf.Output{
gm.Graph.Operation("StatefulPartitionedCall").Output(0),
},
nil,
)
if e != nil {
errorHandler.Error(e)
return e
}
}
boolTest, e := tf.NewTensor([][]float32{{0.5, 0.5, 0.5}, {0, 0, 0}})
test, e := gm.Session.Run(
map[tf.Output]*tf.Tensor{
gm.Graph.Operation("predict_data").Output(0): boolTest,
},
[]tf.Output{
gm.Graph.Operation("StatefulPartitionedCall_1").Output(0),
},
nil,
)
if e != nil {
errorHandler.Error(e)
return e
}
testResults, ok := test[0].Value().([][]float32)
if !ok {
fmt.Println("No post training float results")
return nil
}
fmt.Println(testResults)
os.RemoveAll("gm-trained")
os.MkdirAll("gm-trained/variables", os.ModePerm)
savedModel, e := ioutil.ReadFile("/data/models/gm/saved_model.pb")
if e != nil {
errorHandler.Error(e)
return e
}
e = ioutil.WriteFile("gm-trained/saved_model.pb", savedModel, os.ModePerm)
if e != nil {
errorHandler.Error(e)
return e
}
filenameInput, e := tf.NewTensor("gm-trained/variables/variables")
if e != nil {
errorHandler.Error(e)
return e
}
_, e = gm.Session.Run(
map[tf.Output]*tf.Tensor{
gm.Graph.Operation("saver_filename").Output(0): filenameInput,
},
[]tf.Output{
gm.Graph.Operation("StatefulPartitionedCall_2").Output(0),
},
nil,
)
if e != nil {
errorHandler.Error(e)
return e
}
gmTrained, e := tf.LoadSavedModel("gm-trained", []string{"serve"}, nil)
if e != nil {
errorHandler.Error(e)
return nil
}
boolTest, e = tf.NewTensor([][]float32{{0.5, 0.5, 0.5}, {0, 0, 0}})
test, e = gmTrained.Session.Run(
map[tf.Output]*tf.Tensor{
gmTrained.Graph.Operation("predict_data").Output(0): boolTest,
},
[]tf.Output{
gmTrained.Graph.Operation("StatefulPartitionedCall_1").Output(0),
},
nil,
)
if e != nil {
errorHandler.Error(e)
return e
}
testResults, ok = test[0].Value().([][]float32)
if !ok {
fmt.Println("No post training float results")
return nil
}
fmt.Println(testResults)
Hopefully that makes sense and you can use it for your article, and maybe even add a method to save the model in this repo.
woah, thanks for sharing! I guess I'm going to extract something from your code for tfgo :smile:
Hey, I've continued digging and managed to get the human readable graph def of that saved model. Under the hood the save method seems to be using op.SaveV2
to save the variables and you can see the function definition of StatefulPartitionedCall_2
by searching the graph def for: __inference__traced_save_719
Human readable graph: https://gist.github.com/CodingBeard/769a42d06a9b9d518e69f6c1ae41e45b
I got the graph in that format by making use of tensorflow/c/c_api_experimental.h:TF_GraphDebugString
by adding the following function (I'm aware of the lack of memory management) to github.com/galone/tensorflow/go/graph.go
func (g *Graph) GetDebugString() string {
tmp := C.ulong(1)
graphDebugChar := C.TF_GraphDebugString(g.c, (*C.ulong)(&tmp))
goString := C.GoString(graphDebugChar)
return goString
}
You can figure out what's going on under the hood of a StatefulPartitionedCall (which is a just the graph of a tf.function) using that debug string method.
How can i save a trained model using golang? With the help of tf-go and Mr.CodingBeard 's code, i can train simple machine leanring model in an online scenario and test them by evaluating related performance metric. Could you please give me some insights to save trained model in save_model or checkpoint format.
i tried to use privided "saver_filename" and "StatefulPartitionedCall_2" but did not work in other self-build models that are similar to Mr.CodingBeard 's gm
If you're interested in using a model for inference, it's better to train the model in Pyton, export it as a SavedModel and then use it from tfgo.
If instead you're interested in training a model with tfgo (but currently we don't support model saving, hence you have to keep it in memory after training and you can't save it on disk), you must export the training graph + model in Python as a SavedModel and use it from tfgo. I wrote an article that explains this process: https://pgaleone.eu/tensorflow/go/2020/11/27/deploy-train-tesorflow-models-in-go-human-activity-recognition/