Open jarednielsen opened 4 years ago
Running the following script with tensorflow==1.15.0:
import tensorflow.compat.v2 as tf import smdebug.tensorflow as smd from tempfile import TemporaryDirectory mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255, x_test / 255 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax'), ]) with TemporaryDirectory() as dirpath: hook = smd.KerasHook(out_dir=dirpath) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5, callbacks=[hook]) trial = smd.create_trial(path=dirpath) print(hook) print(trial)
gives the following output:
<smdebug.tensorflow.keras.KerasHook object at 0x1025aaed0>:( out_dir=/var/folders/r1/mgxfss8d45jbs_vl464bbsg906jznv/T/tmpdzybvlqg, tensorboard_dir=None, step=9374, mode=ModeKeys.TRAIN, mode_steps={<ModeKeys.GLOBAL: 4>: 9374, <ModeKeys.TRAIN: 1>: 9374}, include_collections=['metrics', 'losses', 'sm_metrics'], writer=None, save_config=<class SaveConfig: {<ModeKeys.TRAIN: 1>: <class SaveConfig: save_interval=500, save_steps=[], start_step=0, end_step=None>, <ModeKeys.EVAL: 2>: <class SaveConfig: save_interval=500, save_steps=[], sta ...>, reduction_config=<class ReductionConfig: reductions=[], abs_reductions=[], norms=[], abs_norms=[]>, save_all=False, dry_run=False, ) <smdebug.trials.local_trial.LocalTrial object at 0x1025b0f50>:( name=tmpdzybvlqg, path=/var/folders/r1/mgxfss8d45jbs_vl464bbsg906jznv/T/tmpdzybvlqg, steps=[0, 500, 1000, 1500, 1874, 2000, 2500, 3000, 3500, 3749, 4000, 4500, 5000, 5500, 5624, 6000, 6500, 7000, 7499, 7500, 8000, 8500, 9000, 9374], collections=['default', 'weights', 'biases', 'gradients', 'losses', 'metrics', 'inputs', 'outputs', 'all', 'sm_metrics'], tensor_names=['acc', 'batch', 'loss', 'size'], )
It appears to be saving every 1874th step, in addition to every 500th. Is this desired behavior?
can you check mode and mode step of saved global steps?
This is probably the last step in an epoch. We save additional metrics which Keras only gives us at the end of epoch at that point
Running the following script with tensorflow==1.15.0:
gives the following output:
It appears to be saving every 1874th step, in addition to every 500th. Is this desired behavior?