Open kuangzy2011 opened 1 year ago
The implementation seems correct, but validation loss is sky-rocketing. I assume model is dealing with too extreme values and overfitting on training dataset.
Can you remove the line: outputs = tfpl.OneHotCategorical(NUM_CLASSES)(dense)
And try to work with logits: tfpl.DistributionLambda(lambda x: tfd.OneHotCategorical(logits = 0.01 * x)
We scale the logits to see if model is dealing with extreme values.
Update model and train again.
def create_model_normal(num_points):
inputs = keras.Input(shape=(num_points, 3))
x = tnet(inputs, 3)
x = conv_bn(x, 32)
x = conv_bn(x, 32)
x = tnet(x, 32)
x = conv_bn(x, 32)
x = conv_bn(x, 64)
x = conv_bn(x, 512)
x = layers.GlobalMaxPooling1D()(x)
x = dense_bn(x, 256)
x = layers.Dropout(0.25)(x)
x = dense_bn(x, 128)
x = layers.Dropout(0.25)(x)
dense = tfpl.DenseReparameterization(
units = tfpl.OneHotCategorical.params_size(NUM_CLASSES), activation = None,
kernel_posterior_fn = tfpl.default_mean_field_normal_fn(is_singular=False),
kernel_prior_fn = tfpl.default_multivariate_normal_fn,
bias_prior_fn = tfpl.default_multivariate_normal_fn,
bias_posterior_fn = tfpl.default_mean_field_normal_fn(is_singular=False),
kernel_divergence_fn = divergence_fn,
bias_divergence_fn = divergence_fn)(x)
outputs = tfpl.DistributionLambda(lambda x: tfd.OneHotCategorical(logits = 0.01 * x))(dense)
model = keras.Model(inputs=inputs, outputs=outputs, name="pointnet")
return model
train_dataset <BatchDataset shapes: ((None, 231, 3), (None, 10)), types: (tf.float64, tf.float64)>
test_dataset <BatchDataset shapes: ((None, 231, 3), (None, 10)), types: (tf.float64, tf.float64)>
2023-02-17 13:33:57.712078: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/30
2023-02-17 13:34:12.895018: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005
1310/1310 - 67s - loss: 22.3201 - accuracy: 0.1045 - val_loss: 757354368.0000 - val_accuracy: 0.1000
Epoch 2/30
1310/1310 - 44s - loss: 21.3708 - accuracy: 0.1198 - val_loss: 317258464.0000 - val_accuracy: 0.1000
Epoch 3/30
1310/1310 - 44s - loss: 20.4662 - accuracy: 0.1460 - val_loss: 50089232.0000 - val_accuracy: 0.1190
Epoch 4/30
1310/1310 - 44s - loss: 19.5705 - accuracy: 0.1738 - val_loss: 56752180.0000 - val_accuracy: 0.1167
Epoch 5/30
1310/1310 - 44s - loss: 18.6980 - accuracy: 0.2097 - val_loss: 31248118.0000 - val_accuracy: 0.1048
Epoch 6/30
1310/1310 - 44s - loss: 17.8547 - accuracy: 0.2476 - val_loss: 503188.3125 - val_accuracy: 0.1071
Epoch 7/30
1310/1310 - 44s - loss: 17.0483 - accuracy: 0.2886 - val_loss: 623102.0000 - val_accuracy: 0.1429
Epoch 8/30
1310/1310 - 44s - loss: 16.2785 - accuracy: 0.3381 - val_loss: 2514874.5000 - val_accuracy: 0.0905
Epoch 9/30
1310/1310 - 44s - loss: 15.5474 - accuracy: 0.3927 - val_loss: 303895.8438 - val_accuracy: 0.0976
Epoch 10/30
1310/1310 - 44s - loss: 14.8597 - accuracy: 0.4460 - val_loss: 725294.3125 - val_accuracy: 0.1024
Epoch 11/30
1310/1310 - 43s - loss: 14.2143 - accuracy: 0.5031 - val_loss: 1745893.1250 - val_accuracy: 0.0881
Epoch 12/30
1310/1310 - 45s - loss: 13.6149 - accuracy: 0.5614 - val_loss: 1540100.3750 - val_accuracy: 0.1286
Epoch 13/30
1310/1310 - 43s - loss: 13.0604 - accuracy: 0.6151 - val_loss: 4525690.0000 - val_accuracy: 0.1071
Epoch 14/30
1310/1310 - 44s - loss: 12.5467 - accuracy: 0.6683 - val_loss: 3689776.5000 - val_accuracy: 0.1119
Epoch 15/30
1310/1310 - 44s - loss: 12.0814 - accuracy: 0.7103 - val_loss: 6925624.0000 - val_accuracy: 0.1286
Epoch 16/30
1310/1310 - 44s - loss: 11.6500 - accuracy: 0.7498 - val_loss: 6733758.0000 - val_accuracy: 0.1190
Epoch 17/30
1310/1310 - 44s - loss: 11.2548 - accuracy: 0.7785 - val_loss: 99263632.0000 - val_accuracy: 0.0952
Epoch 18/30
1310/1310 - 44s - loss: 10.8870 - accuracy: 0.8032 - val_loss: 6472049.0000 - val_accuracy: 0.0714
Epoch 19/30
1310/1310 - 44s - loss: 10.5297 - accuracy: 0.8363 - val_loss: 19297026.0000 - val_accuracy: 0.0976
Epoch 20/30
1310/1310 - 43s - loss: 10.1925 - accuracy: 0.8700 - val_loss: 43124540.0000 - val_accuracy: 0.1167
Epoch 21/30
1310/1310 - 44s - loss: 9.8820 - accuracy: 0.8970 - val_loss: 124067400.0000 - val_accuracy: 0.0833
Epoch 22/30
1310/1310 - 44s - loss: 9.5937 - accuracy: 0.9173 - val_loss: 79799040.0000 - val_accuracy: 0.1000
Epoch 23/30
1310/1310 - 44s - loss: 9.3247 - accuracy: 0.9325 - val_loss: 446929312.0000 - val_accuracy: 0.1262
Epoch 24/30
1310/1310 - 44s - loss: 9.0735 - accuracy: 0.9442 - val_loss: 732394368.0000 - val_accuracy: 0.0976
Epoch 25/30
1310/1310 - 43s - loss: 8.8402 - accuracy: 0.9531 - val_loss: 776593152.0000 - val_accuracy: 0.0810
Epoch 26/30
1310/1310 - 44s - loss: 8.6207 - accuracy: 0.9588 - val_loss: 140933664.0000 - val_accuracy: 0.0905
Epoch 27/30
1310/1310 - 44s - loss: 8.4134 - accuracy: 0.9637 - val_loss: 246708416.0000 - val_accuracy: 0.1214
Epoch 28/30
1310/1310 - 44s - loss: 8.2224 - accuracy: 0.9681 - val_loss: 1348214400.0000 - val_accuracy: 0.1048
Epoch 29/30
1310/1310 - 44s - loss: 8.0444 - accuracy: 0.9707 - val_loss: 136303104.0000 - val_accuracy: 0.1143
Epoch 30/30
1310/1310 - 43s - loss: 7.8739 - accuracy: 0.9740 - val_loss: 709458240.0000 - val_accuracy: 0.0976
add Codeadd Markdown
Hmm this is strange, I checked the original Keras doc which has extreme loss values in validation too, so it will be hard to converge for a Bayesian NN.
Can you try flipout layers? They should provide less variance in the training if I am not mistaken. They have the same signatures as Reparameterization layers.
Ex: Try changing Convolution1DReparameterization
to Convolution1DFlipout
and same for Dense layers too. Also it's worth experimenting with RMSprop too.
Hi Frightera: I am doing some test with pointnet to classify point cloud object. And there might be some other unknown object are not listed in training set. So I tried to convert pointnet model(https://keras.io/examples/vision/pointnet/) to tensorflow probability model by refering your Simple Fully Probabilistic Bayesian CNN, and hope the model can say "I don't know" if target object is not in training model, but the results of the training were less than ideal. Is it possible to convert point cloud model to tensorflow Probabilistic Deep Learning model ?