Closed ghost closed 3 years ago
In colab tesla T4 or P100 has 16GB GPU memory...I just referred my local code and the batch size was 64 itself.. In any case, you can reduce the batch size to a lower value.. say 48 or 32 and contiue the training...
I use slim512.ipynb notebook without any changes, but even if I set batch size = 2, memory using grow and environment crush. I use portrait256 dataset with 32700 images.
It could be the issue with dataloader, i suppose..
Please modify the last few lines in 'DataLoader' class as follows:-
if shuffle:
# Prefetch, shuffle then batch
data = data.batch(batch_size).repeat().prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
#data = data.prefetch(tf.data.experimental.AUTOTUNE).shuffle(random.randint(0, len(self.image_paths))).repeat().batch(batch_size)
else:
# Batch and prefetch
data = data.repeat().batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)
return data
Training started, but loss and val_loss is always nan
Maybe something wrong with dataset portrait256.zip?
Training images and masks are .png files.
data_example.zip - 500 KB
For training the sliment model using the current code, you need to convert the mask images to raw segmentaion masks i.e it should contain only two (pixel)values: 0 for background and 1 for foreground(currently it is 255 for foreground in portrait256).
For training with AISegment dataset, i had already prepared such a dataset. So you need to perform some preprocessing either before training or inside the data loder function:-
Here is the basic idea
// Convert to binary mask
mask[mask>=127]=1
mask[mask<127]=0
Refer: https://github.com/anilsathyan7/Portrait-Segmentation/issues/22#issuecomment-688373468
I give up, tried with all possible datasets, still the same..., i'm not python programmer...
Just try modifying the data loader '_resize_data' function as follows:-
# mask = tf.image.resize(mask, [self.image_size, self.image_size], method='nearest')
mask = tf.image.resize(mask, [self.image_size, self.image_size], method='nearest') //255
Yes it works, train and detect, Thanks a lot, you're awesome.
Hello, how did you train SlimNet with GTX 1080 Ti ?, I tried to train the network in Google Colab (slim512.ipynb), but 13 GB GPU memory is not enough, the environment crashes.