Closed iris-qq closed 4 years ago
HI @atlantiswqq ,
Maybe you can be interested in the discussion from this reply on.
In short: it should be easy. Just export the saved model. Then use the saved_model_cli
to get the input and output nodes names, and you should be OK.
hi @galeone,
If I get the names of input and output nodes through saved_model_cli
, I will also encounter a problem. When training with tf.estimator
, input_fn
input is dict structure. When using tfgo, The
func NewTensor(value interface{}) (*Tensor, error)
will have Valid values are scalars, slices, and arrays. The features in my dict are strings, integers, floats, so there's no way to translate them into the corresponding tensor, I don't know if you can understand it, if you can, please tell me how to pass the features of dict into the model to predict the results. Thank you very much
python train:
def input_fn(self,x,y):
return tf.estimator.inputs.pandas_input_fn(
x,
y,
batch_size=512,
num_epochs=1,
shuffle=False,
queue_capacity=1000,
num_threads=5
)
est = tf.estimator.LinearClassifier(feature_columns=one_hot_feature_columns+crossed_columns,
n_classes=2,
model_dir = self.modelDir,
optimizer=tf.train.FtrlOptimizer(
learning_rate=0.01,
l2_regularization_strength=0.02)
)
est.train(steps=50000,input_fn=self.input_fn(trains[cols], trains.click_label ))
go predict:
model, err := tf.LoadSavedModel(modelPath, []string{"recomd"}, nil)
if err !=nil{
fmt.Println("load model error,",err.Error())
return
}
type Input struct {
Apl string
City string
Age string
TerminalType string
CityDivide string
SexUsable string
PhoneBind string
PhoneBrand string
CityByIP string
ProvinceByPhone string
ClickCntRatio string
ClickUserRatio int
ViFirstPvDiff float64
ViScore float64
HzPv30D float64
BillsMoney string
ViOrderNum float64
ViDonationAmount float64
ViPvM float64
PvCnt float64
PvUser int
ClickUser int
ClickCnt float64
}
inputData := [1]Input{{"10201", "未知", "未知", "android", "未知", "U", "N", "PBBM30", "未知", "未知", "", 0, 0.00875, 0.03, 0.0, "", 0.03333333333333333, 0.02, 0.13333333333333333, 0.0, 0, 0, 0.0}}
tensor, err := tf.NewTensor(inputData)
if err !=nil{
log.Fatal(err)
}
sess:= model.Session
result,err:= sess.Run(map[tf.Output]*tf.Tensor{
model.Graph.Operation("global_step").Output(0):tensor,
},
[]tf.Output{
model.Graph.Operation("save/SaveV2").Output(0),
},
nil,
)
tf.NewTensor
is going to report an error here, but I don't know what the right way to use it is
You have to look inside the saved_model_cli
output and get the list of the input tensors.
There is a high chance (but I'm not sure since I've not tested it) that every element of your input dictionary, is translated to an input node with the correct type.
E.g.dict["apl"]
is now something like input_tensor_string:0
, and so on.
So, once you have the complete list of your inputs, you have just to create not a single giant tensor with a struct
, with several single tensor with the native types.
If you post the output of saveld_model_cli
executed as I described here (https://github.com/galeone/tfgo/issues/21#issuecomment-538626727) on your SavedModel file, we can have a look at it together
I executed save_model_cli to get the following information:
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['classification']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 2)
name: linear/head/Tile:0
outputs['scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: linear/head/predictions/probabilities:0
Method name is: tensorflow/serving/classify
signature_def['predict']:
The given SavedModel SignatureDef contains the following input(s):
inputs['examples'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['class_ids'] tensor_info:
dtype: DT_INT64
shape: (-1, 1)
name: linear/head/predictions/classes:0
outputs['classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 1)
name: linear/head/predictions/str_classes:0
outputs['logistic'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: linear/head/predictions/logistic:0
outputs['logits'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: linear/head/predictions/logits:0
outputs['probabilities'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: linear/head/predictions/probabilities:0
Method name is: tensorflow/serving/predict
signature_def['regression']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['outputs'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: linear/head/predictions/logistic:0
Method name is: tensorflow/serving/regress
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 2)
name: linear/head/Tile:0
outputs['scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: linear/head/predictions/probabilities:0
Method name is: tensorflow/serving/classify
But I still don't quite understand what you're saying, how do you use multiple tensor as input to make predictions, and I'll try that out later, wait for me, thank you so much
With a bit of excitement, my colleague successfully loaded it with Java and predicted the result. I would like to share this result with you, so that you can design this way of feature transformation for golang in the future. java code:
import com.google.protobuf.ByteString;
import java.util.Arrays;
import org.tensorflow.*;
import org.tensorflow.example.*;
public class Main {
// Returns a Feature containing a BytesList, where each element of the list
// is the UTF-8 encoded bytes of the Java string.
public static Feature feature(String... strings) {
BytesList.Builder b = BytesList.newBuilder();
for (String s : strings) {
b.addValue(ByteString.copyFromUtf8(s));
}
return Feature.newBuilder().setBytesList(b).build();
}
public static Feature feature(float... values) {
FloatList.Builder b = FloatList.newBuilder();
for (float v : values) {
b.addValue(v);
}
return Feature.newBuilder().setFloatList(b).build();
}
public static void main(String[] args) throws Exception {
Features features =
Features.newBuilder()
.putFeature("Attribute1", feature("A12"))
.putFeature("Attribute2", feature(12))
.putFeature("Attribute3", feature("A32"))
.putFeature("Attribute4", feature("A40"))
.putFeature("Attribute5", feature(7472))
.putFeature("Attribute6", feature("A65"))
.putFeature("Attribute7", feature("A71"))
.putFeature("Attribute8", feature(1))
.putFeature("Attribute9", feature("A92"))
.putFeature("Attribute10", feature("A101"))
.putFeature("Attribute11", feature(2))
.putFeature("Attribute12", feature("A121"))
.putFeature("Attribute13", feature(24))
.putFeature("Attribute14", feature("A143"))
.putFeature("Attribute15", feature("A151"))
.putFeature("Attribute16", feature(1))
.putFeature("Attribute17", feature("A171"))
.putFeature("Attribute18", feature(1))
.putFeature("Attribute19", feature("A191"))
.putFeature("Attribute20", feature("A201"))
.build();
Example example = Example.newBuilder().setFeatures(features).build();
String pfad = System.getProperty("user.dir") + "\\1511523781";
try (SavedModelBundle model = SavedModelBundle.load(pfad, "serve")) {
Session session = model.session();
final String xName = "input_example_tensor";
final String scoresName = "dnn/head/predictions/probabilities:0";
try (Tensor<String> inputBatch = Tensors.create(new byte[][] {example.toByteArray()});
Tensor<Float> output =
session
.runner()
.feed(xName, inputBatch)
.fetch(scoresName)
.run()
.get(0)
.expect(Float.class)) {
System.out.println(Arrays.deepToString(output.copyTo(new float[1][2])));
}
}
}
}
Of course, this method is also copied from others, attach the link https://stackoverflow.com/questions/47477314/tensorflow-java-api-set-placeholder-for-categorical-columns
Oh wow, that's great! My guess was that every single attribute of your input dictionary would have been converted to an input node, but instead, they converted the whole dictionary to a single input of type byte (if I'm understanding the Java code correctly).
Yup, I agree this kind of input handling should be provided by tfgo - I'll work on that today (I have 12 hours or flight, now I have something fun to do :+1: ).
I'm wondering: can you can give me a little bit of help, by sharing a simple Python code that uses the tf.estimator
API (and trains on dummy data)?
Hi @atlantiswqq !
I gave a look at your code and while I was analyzing it to add this feature to tfgo I found something interesting (I guess).
Does the input format depends on an arbirary choice?
I mean, I see the line
featureValue = {k:tf.VarLenFeature(dtype=tf.string) for k in cols}
And looking at the deserialization your co-worker created in Java, it looks like it is just a conversion from a string/byte type to the correct type (and in go you can do the same, using the standard library)
Thus I can't just create generic deserialization for the tf.estimator
input since the input is not standard but changes depending on the Python script that created it.
Or am I missing something?
I apologize for the late reply。 In fact, this is not a normal bytecode, but if you look at the load in python you'll see that it actually converts the hashmap into a protobuf data output format at the bottom of tensorflow.
Attached here is the code I loaded in python to train the model.
model_path = "the model you saved"
predict_fn = tf.contrib.predictor.from_saved_model(model_path)
data = pd.read_csv("/users/wqq/desktop/temp.csv")
for row_index, row in data.iterrows():
examples = []
feature = {}
for col, value in row.iteritems():
if col not in ["user_id", "click_label"]:
feature[col] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[bytes(str(value), "utf-8")]))
example = tf.train.Example(features=tf.train.Features(feature=feature))
examples.append(example.SerializeToString())
predictions = predict_fn({"inputs": examples})
The tf.train.Feature
supports three formats, float, int, and bytes. And since I have strings and things like that in my features, I'm going to use bytes_list
I am studying the tensorflow source code and seeing how example converts a hashmap into a sequence of bytes, with the underlying implementation being the passing of a protocbuf sequence of bytes. I hope you can look at this file as well. map to bytes I've used golang for RPC services before, so I know protobuf a little bit, and if I can, I think I should be able to do this in a short time.
Hi, I implemented the map byte sequence here. This can be a bit of a hassle. You need to get the following seven proto files from tensorflow's source code and compile them using the command.
protoc --proto_path=you proto file's path --go_out=plugins=grpc:. *.proto
Normally you should get the following file structure. You need to put them in their respective paths at go/src/github.com.
And if you did all this correctly, then you could translate your map feature into a sequence of bytes using the following code, and then use tf.NewTensor
to get a NewTensor to make the prediction.
package main
import (
"fmt"
"github.com/golang/protobuf/proto"
"github.com/tensorflow/tensorflow/tensorflow/go/core/example"
)
func main(){
// this is you feature k-v format.
data:= map[string]string{"gender":"male","age":"19"}
Feature:= make(map[string]*example.Feature)
for k,v := range data{
valFormat:= StringToFeature(v)
Feature[k] = valFormat
}
Features:=example.Features{Feature:Feature}
myExample := example.Example{Features:&Features}
buf,err:= proto.Marshal(&myExample)
if err !=nil{
fmt.Println("buf error")
}
fmt.Println(buf)
}
func StringToFeature(value string)(exampleFeature *example.Feature){
// this arr support [][]byte []float32 []int32
// it likes this method in python
//tf.train.BytesList tf.train.Int64List tf.train.FloatList
bytesArr:= [][]byte{[]byte(value)}
bytesList:=example.BytesList{Value:bytesArr}
featureBytesList:= example.Feature_BytesList{BytesList:&bytesList}
exampleFeature = &example.Feature{Kind:&featureBytesList}
return
}
best wishes~
Hi @atlantiswqq ! Sorry for the late reply and thank you for finding out how to correctly get the correct input for the tfgo.NewTensor
call!
Do you think that the proto files must be in the TensorFlow repo (because they depend upon other proto files available there) or can they just be copied into tfgo?
Because if there are no dependencies and we can copy the file here, then we can add this feature to tfgo without depending on the TensorFlow repository.
What do you think about this?
Also, if you want to create a pull request to add this feature feel free to open it.
Of course, we can copy this proto file into tfgo, compile an examle package, and complete the function of map to byte sequence in this package. I will give you a branch soon, if you like.
Of course, yes please!
Hi @atlantiswqq - any news on the branch? I'd really like to embed this feature in tfgo :-)
Yes, just for a couple of days,haha
I am trying to predict the data in my pb file loaded by golang, but I have encountered a bug that I am trying to solve. The new branch may be a little late.The errors I have encountered are of the same nature as those in these links. java.lang.IllegalArgumentException: Expects arg[0] to be float but string is provided (java) Getting error - Expects arg[0] to be float but uint8 is provided while using bytebuffer for image data in tensorflow inference
A little excited, I solved all the difficulties, clean up the code tomorrow, and then push the branches. @galeone
Great! Thank you @atlantiswqq :+1: can't wait to have a look at the code
wqq@Atlantis:~/go/src/github.com/galeone/tfgo$git push origin estimator
Username for 'https://github.com': atlantiswqq
Password for 'https://atlantiswqq@github.com':
remote: Permission to galeone/tfgo.git denied to atlantiswqq.
fatal: unable to access 'https://github.com/galeone/tfgo/': The requested URL returned error: 403
I need your permission.
Can you open a pull request?
Working in a different branch in the same repo is fine, you just have to:
git remote add fork https://github.com/atlantiswqq/tfgo
git push fork estimator
(now the branch estimator is online)estimator
inside this repo (so I can have a look at the code and do the code review).Thanks~
I am not a professional programmer. If there is something wrong with the code, you can directly send me an email. My email address is atlantis.wqq@gmail.com I have created a pull,best wishes!
Thank you :+1: I'll review the merge request in the next few days, but it looks already OK!
I give you a feedback as soon as I tested and maybe updated it somehow (if needed)
Hi @atlantiswqq , I'm doing the code review for your pull request and I've refactored it a little bit - you can have a look at my changes in the branch https://github.com/galeone/tfgo/tree/atlantiswqq-estimator.
Please do not close your pull request, since I'm going to merge my changes into your merge request before merging it.
My changes are:
//go:generate
to invoke protoc and compile the protobuf into the .pb.go
files.*tfgo.Model
type, named ExecEstimator
; this method also accepts in input a "preprocessor" function, that is one of the functions you've written to preprocess the data, before feeding them into the model.I'm also refactoring the code, the CI and some other stuff a little bit before merging your request - but it's really good (I'm also creating some tests, in order to test the ExecEstimator
method.
If you want, you can share your thoughts.
I will continue to update this new branch in the next few days - I hope to complete the tests and the refactor within the next week.
For example, when I used tf.estimator training model, input_fn was used in training and feature_column was specified, so there was no name of input node and output node, could you help me?