PAIR-code / what-if-tool

Source code/webpage/demos for the What-If Tool
https://pair-code.github.io/what-if-tool
Apache License 2.0
907 stars 169 forks source link

Integer values after using the tf.estimator.BoostedTreesClassifier generate errors in the WIT. #163

Open DebbyDebStar opened 3 years ago

DebbyDebStar commented 3 years ago

I create my data set with tf.data.experimental.make_batched_features_dataset, where numerical data are passed as float, categorical data as integer and vocabs as string. For serving, I create the spectrum feature manually, just like I did before when creating the dataset. Then I create a list of my features and pass them on to the model. The model trains successfully, but the categorical integers in the WIT generate an error.

I don't understand why the data is being passed as float according to the error message (At the end of my comment). The categorical columns of the data that are transferred to the WIT are also integers...

Has anyone already had the same mistake? Or can someone give me a tip where the error could be?

The _input_fn function that returns my dataset:

def _input_fn(tf_transform_output,
              transformed_examples,
              batch_size: int = 16, num_e=None) -> tf.data.Dataset:
    print(num_e)
    transformed_feature_spec = (
        tf_transform_output.transformed_feature_spec().copy())

    dataset = tf.data.experimental.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=transformed_feature_spec,
        reader=None,
        label_key=label_column, num_epochs=num_e)

    return dataset

My function for Serving:

def _example_serving_receiver_fn(model, tf_transform_output):

    model.tft_layer = tf_transform_output.transform_features_layer()

    RAW_DATA_FEATURE_SPEC = dict(
        [(name, tf.io.FixedLenFeature([], tf.float32))
         for name in NUMERIC_FEATURE_KEYS]
        + [(name, tf.io.FixedLenFeature([], tf.int64))
           for name in CATEGORICAL_FEATURE_KEYS]
        + [(name, tf.io.FixedLenFeature([],
                                        tf.string))
           for name in VOCAB_FEATURE_KEYS]
        + [(LABEL_KEY,
            tf.io.FixedLenFeature([], tf.float32))])

    raw_feature_spec = RAW_DATA_FEATURE_SPEC

    raw_feature_spec.pop(LABEL_KEY)

    raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(  # noqa: E501
        raw_feature_spec, default_batch_size=None)
    serving_input_receiver = raw_input_fn()

    transformed_features = tf_transform_output.transform_raw_features(
        serving_input_receiver.features)

    return tf.estimator.export.ServingInputReceiver(
        transformed_features, serving_input_receiver.receiver_tensors)

Create the feature columns:

feature_cols = [
        tf.feature_column.numeric_column(key, shape=(1,), dtype=tf.float32)
        for key in NUMERIC_FEATURE_KEYS
    ]
for key, buckets in zip(
        CATEGORICAL_FEATURE_KEYS, MAX_CATEGORICAL_VALUES):
    categorical = tf.feature_column.categorical_column_with_identity(
        key, num_buckets=buckets, default_value=0)
    categorical = tf.feature_column.indicator_column(categorical)
    feature_cols.append(categorical)

for key in VOCAB_FEATURE_KEYS:
    vocab = tf.feature_column.categorical_column_with_identity(
        key, num_buckets=VOCAB_SIZE + OOV_SIZE, default_value=0)
    vocab = tf.feature_column.indicator_column(vocab)
    feature_cols.append(vocab)

model = tf.estimator.BoostedTreesClassifier(
    feature_cols, n_batches_per_layer=1, model_dir=save_path_training)

model.train(partial(_input_fn, tf_transform_output,
                    train_data_path_pattern), max_steps=100)

result_eval = model.evaluate(partial(_input_fn, tf_transform_output,
                                     eval_data_path_pattern, num_e=1))

result_train = model.evaluate(partial(_input_fn, tf_transform_output,
                                      train_data_path_pattern, num_e=1))

However, the WIT then throws this error:

2021-05-12 11:18:29.680711: I tensorflow_serving/model_servers/server.cc:391] Exporting HTTP/REST API at:localhost:8501 ... 22021-05-12 11:18:42.060964: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at example_parsing_ops.cc:92 : Invalid argument: Name: , Key: Siblings_Spouses, Index: 0. Data types don't match. Data type: float but expected type: int64

32021-05-12 11:18:42.213860: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at example_parsing_ops.cc:92 : Invalid argument: Name: , Key: Siblings_Spouses, Index: 0. Data types don't match. Data type: float but expected type: int64

42021-05-12 11:18:42.389265: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at example_parsing_ops.cc:92 : Invalid argument: Name: , Key: Siblings_Spouses, Index: 0. Data types don't match. Data type: float but expected type: int64

52021-05-12 11:18:42.557218: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at example_parsing_ops.cc:92 : Invalid argument: Name: , Key: Parents_Children, Index: 0. Data types don't match. Data type: float but expected type: int64

jameswex commented 3 years ago

Thanks for reaching out.

Looks like you are using WIT with a model being served by TF serving, is that correct? It might be easier to debug by having the model run in the same process as WIT using custom prediction functions instead (https://pair-code.github.io/what-if-tool/learn/tutorials/custom-prediction/).

Does the data itself look correct when loaded into WIT (even though model predictions fail)? Are you able to share the dataset or does it contain sensitive information?

DebbyDebStar commented 3 years ago

@jameswex Thank you for your fast reply.

I think the custom prediction function is a grate Idea. We will have a try and I will report if and how we solve the Problem.

The Data looks correct when loaded into WIT.