DeepTrackAI / DeepTrack2

DeepTrack2
MIT License
162 stars 49 forks source link

Loading weights/model for LodeSTAR #195

Closed MadMax129 closed 1 year ago

MadMax129 commented 1 year ago

Hi,

I've had the following problem:

Versions:
Python    3.9.6
deeptrack 1.6.0

After training a LodeSTAR model with the following:

model = dt.models.LodeSTAR(input_shape=(None, None, args.input_channels))
model_checkpoint = ModelCheckpoint(
      args.model_path + "/epoch_{epoch:02d}.h5",
      monitor='val_loss',
      verbose=1,
      save_best_only=False,
      save_weights_only=True,
      mode='auto',
      period=1
  )
model.fit(
      train_set, # Valid train_set
      epochs = args.epochs, 
      batch_size = args.batch, 
      callbacks=[model_checkpoint], 
      generator_kwargs={"shuffle_batch": False}
  )

I attempt to load back a weights .h5 file to run inference like this.

model = dt.models.LodeSTAR(
        input_shape=(None, None, args.input_channels)
)
#model.compile(optimizer='adam', loss='mae')

model.load_weights(args.model_path)

However, I recieve this error:

ValueError: Unable to load weights saved in HDF5 format into a subclassed Model which has not created its variables yet. Call the Model first, then load the weights.

So I experimented with adding the commented line model.compile(...) as well as model.build(...) but those seem to make the network not return anything as if it never initalized its weights. The strange part is none of this happens if I do model.fit(...) first and then call model.predict_and_detect(...) leading me to believe there is something wrong with LodeSTAR.


Also I attempted saving the entire Keras model and reloading it like so:

model_checkpoint = ModelCheckpoint(
        args.model_path + "/epoch_{epoch:02d}.h5",
        monitor='val_loss',
        verbose=1,
        save_best_only=False,
        save_weights_only=False,
        mode='auto',
        period=1
    )

And when loading it back with model.load_model(...) it worked but when I called model.predict_and_detect(...) I got the following error:

 frame = Image.open(args.data_path).convert('L')
 frame = np.expand_dims(frame, axis=-1)
 frame = np.array(frame) / np.max(frame)

 batch = np.expand_dims(frame, axis=0)

 det = model.predict_and_detect(
        batch, 
        alpha = args.alpha, 
        beta = 1 - args.alpha, 
        cutoff = args.cutoff, 
        mode='quantile'
 )[0]

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
Positional arguments (2 total):
            * <tf.Tensor 'x:0' shape=(None, 980, 1302) dtype=float32>
            * False
          Keyword arguments: {}

         Expected these arguments to match one of the following 2 option(s):

        Option 1:
          Positional arguments (2 total):
            * TensorSpec(shape=(None, 93, 71), dtype=tf.float32, name='input_1')
            * True
          Keyword arguments: {}

        Option 2:
          Positional arguments (2 total):
            * TensorSpec(shape=(None, 93, 71), dtype=tf.float32, name='input_1')
            * False
          Keyword arguments: {}

    Call arguments received by layer 'lode_star_base_model' (type LodeSTARBaseModel):
      • x=tf.Tensor(shape=(None, 980, 1302), dtype=float32)
      • training=False

I've had no luck figuring a way around this. I just need to save the model and load it back. This error is only happening when dividing up training and inference into seperate tasks. When doing inference right after training in the same file, everything works fine? Any help is greatly appriciated!

MadMax129 commented 1 year ago

101 Resolved