Memory issue in training code

EnyaHermite / SPH3D-GCN

Spherical Kernel for Efficient Graph Convolution on 3D Point Clouds

MIT License

169 stars 32 forks source link

Memory issue in training code #6

Open thias15 opened 4 years ago

thias15 commented 4 years ago

Hello.

I have tried to run your code for the S3DIS dataset. However, with 128GB of memory, the process keeps getting killed when the shuffle buffer for the evaluation of the 2nd epoch is getting filled. When monitoring the memory, it seems that the buffer never gets cleared, it just keeps filling up at the beginning of each epoch.

thias15 commented 4 years ago

The problem is resolved by building the dataset initializer outside the training loop.

EnyaHermite commented 4 years ago

Hi, I followed the official tensorflow website of tf.Data to prepare the data feeding, and did not encounter this problem in the experiments. Just curious. When you move the initializer outside the training loop, is the dataset still shuffled differently in every epoch?

thias15 commented 4 years ago

Here is what I did and it seems to work fine. Outside the training loop:

trainset = input_fn(trainlist, BATCH_SIZE, 10000)
train_iterator = trainset.make_initializable_iterator()
init_train_iterator = train_iterator.make_initializer(trainset)
next_train_element = train_iterator.get_next()

Inside the training loop:

sess.run(init_train_iterator)
train_one_epoch(sess, ops, next_train_element, train_writer)

Then you can do the same thing for the test set.