Open eklahari opened 1 week ago
You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs";
Hi, This is usually because of the different ways CUDA memory is managed in different environments.
There isn't any specific method to resolve this, but in a Linux environment where you are unable to train a model of batch size of 28, you could try and:
torch.cuda.memory_allocated()
and torch.cuda.memory_cached()
to check up on GPU Memory allocationThese aren't solutions but other possibilities in which you can still train your model in a Linux environment... Hope that explains the issues, If there are any more questions please let me know
Thank you
from register_dataset import* #register custom dataset from detectron2 import model_zoo from detectron2.engine import DefaultPredictor, DefaultTrainer from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer from detectron2.data import MetadataCatalog, DatasetCatalog import os
CUDA_LAUNCH_BLOCKING=1. cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.MODEL.MASK_ON = False cfg.DATASETS.TRAIN = ("football_train",) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 2 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") cfg.SOLVER.IMS_PER_BATCH = 28 cfg.SOLVER.BASE_LR = 0.00025 cfg.SOLVER.MAX_ITER = 1000 cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5 # Number of classes in the dataset
cfg.OUTPUT_DIR = "/output1"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) with open(os.path.join(cfg.OUTPUT_DIR, "config.yaml"), "w") as f: f.write(cfg.dump())
trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train() when i am running this code with batch size 28 i am getting cuda error
but i am able to run this file in windows which has same configuration as linux what is issue?how to overcome this could you please provide some code to perform well with increased batch size in linux environment