apple / tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Other
3.66k stars 308 forks source link

Model fit with Nadam results in a continuous RAM usage increase #200

Open matoha opened 3 years ago

matoha commented 3 years ago

Hi,

I've recently got my M1 MacBook Air and I'm working on running some NN training. The optimiser I would normally use is Nadam but I've found that this results in an uncontrolled increase in RAM with each epoch - which appears to me like some sort of memory leak.

This does not happen when running identical code on an Intel CPU or when using other optimisers, such as Adam.

TF version - 0.1alpha3 installed as per the instructions in this repo and issue 153 Python - Python 3.8.6 | packaged by conda-forge Code to reproduce - I've put together a minimal example, below.

Any help would be appreciated!

Log when the optimiser is Nadam:

Epoch 1/1000
300/300 [==============================] - 2s 5ms/step - loss: 0.5382 - mse: 0.4815 - mae: 0.5382 - val_loss: 0.3210 - val_mse: 0.1556 - val_mae: 0.3210
Memory used: 1771.257856 MB
Epoch 2/1000
300/300 [==============================] - 3s 10ms/step - loss: 0.3587 - mse: 0.1982 - mae: 0.3587 - val_loss: 0.2778 - val_mse: 0.1112 - val_mae: 0.2778
Memory used: 2377.908224 MB
Epoch 3/1000
300/300 [==============================] - 4s 14ms/step - loss: 0.3182 - mse: 0.1545 - mae: 0.3182 - val_loss: 0.2630 - val_mse: 0.0962 - val_mae: 0.2630
Memory used: 2984.542208 MB
Epoch 4/1000
300/300 [==============================] - 6s 19ms/step - loss: 0.3007 - mse: 0.1358 - mae: 0.3007 - val_loss: 0.2584 - val_mse: 0.0918 - val_mae: 0.2584
Memory used: 3593.66656 MB
Epoch 5/1000
300/300 [==============================] - 7s 23ms/step - loss: 0.2901 - mse: 0.1248 - mae: 0.2901 - val_loss: 0.2544 - val_mse: 0.0876 - val_mae: 0.2544
Memory used: 4202.971136 MB
Epoch 6/1000
300/300 [==============================] - 8s 28ms/step - loss: 0.2815 - mse: 0.1160 - mae: 0.2815 - val_loss: 0.2557 - val_mse: 0.0890 - val_mae: 0.2557
Memory used: 4812.095488 MB

Output when the optimiser is Adam, all other settings are identical.

Epoch 1/1000
300/300 [==============================] - 1s 2ms/step - loss: 0.4589 - mse: 0.3466 - mae: 0.4589 - val_loss: 0.2572 - val_mse: 0.0905 - val_mae: 0.2572
Memory used: 368.574464 MB
Epoch 2/1000
300/300 [==============================] - 0s 936us/step - loss: 0.3248 - mse: 0.1602 - mae: 0.3248 - val_loss: 0.2556 - val_mse: 0.0889 - val_mae: 0.2556
Memory used: 368.836608 MB
Epoch 3/1000
300/300 [==============================] - 0s 934us/step - loss: 0.3061 - mse: 0.1404 - mae: 0.3061 - val_loss: 0.2542 - val_mse: 0.0875 - val_mae: 0.2542
Memory used: 369.639424 MB
Epoch 4/1000
300/300 [==============================] - 0s 934us/step - loss: 0.2974 - mse: 0.1308 - mae: 0.2974 - val_loss: 0.2536 - val_mse: 0.0869 - val_mae: 0.2536
Memory used: 369.672192 MB
Epoch 5/1000
300/300 [==============================] - 0s 940us/step - loss: 0.2954 - mse: 0.1294 - mae: 0.2954 - val_loss: 0.2545 - val_mse: 0.0877 - val_mae: 0.2545
Memory used: 369.852416 MB
Epoch 6/1000
300/300 [==============================] - 0s 939us/step - loss: 0.2915 - mse: 0.1250 - mae: 0.2915 - val_loss: 0.2540 - val_mse: 0.0874 - val_mae: 0.2540
Memory used: 370.03264 MB

Code:

import tensorflow as tf
import numpy as np
import os
import psutil

num_samples = 50000
num_points = 256
num_labels = 10

x = tf.random.uniform([num_samples, num_points], seed=1)
y = tf.random.uniform([num_samples, num_labels], seed=1)

x_train = x[0:30000]
y_train = y[0:30000]
x_validate = x[30000:50000]
y_validate = y[30000:50000]

inputs = tf.keras.Input(shape=(num_points))
x = inputs

for a in range(3):
    x = tf.keras.layers.Dense(200)(x)    
    x = tf.keras.layers.Dropout(0.1)(x)

outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer=tf.keras.optimizers.Nadam(2e-4), loss='mae', metrics=['mse', 'mae'])

class CustomCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        process = psutil.Process(os.getpid())
        print(f"Memory used: {process.memory_info().rss/1e6} MB")

history = model.fit(x_train, y_train, epochs=1000, batch_size=100, validation_data=(x_validate, y_validate), callbacks=[CustomCallback()])