ayoolaolafenwa / PixelLib

Visit PixelLib's official documentation https://pixellib.readthedocs.io/en/latest/
MIT License
1.04k stars 267 forks source link

training time does not reduces after increasing batch_size on 32 GB CPU instance #169

Open suyogkute opened 1 year ago

suyogkute commented 1 year ago

with batch_size = 1, ETA - 1 hour with batch_size = 16 ETA - 10 hour

this behaviour is same on 32 GB RAM CPU intsnace vs local 16 GB CPU system.

ithllc commented 9 months ago

@elbruno @mmphego @fmorenovr @prateekralhan

Any answer on this from the team would be highly appreciated. I am currently training this on A100's in Google Colab and I second the matter of discussion. I have 12 classes, including the background for image classification. The screenshot is below:

image

output from the train_model method is: Train 808 images Validate 350 images Applying augmentation on dataset Checkpoint Path: /content/drive/MyDrive/computer_vision_model Selecting layers to train

the code for training the model is just like what is discussed in the documentation, and I applied all the software downgrades, currently running it on tensorflow 2.8.0.

code below: from google.colab import drive drive.mount('/content/drive')

!pip3 uninstall tensorflow !pip3 install tensorflow==2.8.0 # current version 2.14.0, reinstall it after this !pip3 install tensorflow--gpu

!pip install requests numpy pillow scipy scikit-image==0.18.3 imgaug matplotlib labelme2coco==0.1.0 pixellib==0.5.2

import pixellib from pixellib.custom_train import instance_custom_training

import json import numpy as np import pandas as pd import os import tensorflow as tf print(tf.version)

train_maskrcnn = instance_custom_training() train_maskrcnn.modelConfig(network_backbone = 'resnet101', num_classes= 12, batch_size = 4) train_maskrcnn.load_pretrained_model(models_dir+'/mask_rcnn_coco.h5') train_maskrcnn.load_dataset(exports_dir) train_maskrcnn.train_model(num_epochs = 100, augmentation=True, path_trained_models = models_dir)