Cannot train model on Windows on CUDA

GotRobbd commented 1 month ago

Trying to train the model on a CPU works as expected. However, when trying to run the training process through CUDA, the code halts and throws this error:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Is there a way to fix this?

mojulian commented 1 month ago

Could you please provide more details about your issue? What's the exact error message thrown?

GotRobbd commented 1 month ago

This is the full terminal text I get whilst trying to run the code: Windows 11 Pro, Python 3.10.11 venv, with all dependecies installed

PS D:\> & d:/.venv3/Scripts/Activate.ps1
(.venv3) PS D:\> cd .\TinyissimoYOLO\
(.venv3) PS D:\TinyissimoYOLO> python --version
Python 3.10.11
(.venv3) PS D:\TinyissimoYOLO> python a_train_export.py  
WARNING ⚠️ no model scale passed. Assuming scale='b'.
New https://pypi.org/project/ultralytics/8.2.36 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.1.29 🚀 Python-3.10.11 torch-2.0.0+cu117 CUDA:0 (NVIDIA GeForce GTX 1660 Ti, 6144MiB)
engine\trainer: task=detect, mode=train, model=./ultralytics/cfg/models/tinyissimo/tinyissimo-v5.yaml, data=../Data/data.yaml, epochs=50, time=None, patience=100, batch=64, imgsz=96, save=True, save_period=-1, cache=False, device=None, workers=8, project=results, name=exp17, exist_ok=False, pretrained=True, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=results\exp17
Overriding model.yaml nc=20 with nc=1
WARNING ⚠️ no model scale passed. Assuming scale='b'.

                   from  n    params  module                                       arguments
  0                  -1  1      1760  ultralytics.nn.modules.conv.Conv             [3, 16, 6, 2, 2]
  1                  -1  1      3504  ultralytics.nn.modules.conv.Conv             [16, 24, 3, 2]
  2                  -1  1      2736  ultralytics.nn.modules.block.C3              [24, 24, 1]
  3                  -1  1      8720  ultralytics.nn.modules.conv.Conv             [24, 40, 3, 2]
  4                  -1  2     11520  ultralytics.nn.modules.block.C3              [40, 40, 2]
  5                  -1  1     28960  ultralytics.nn.modules.conv.Conv             [40, 80, 3, 2]
  6                  -1  3     61600  ultralytics.nn.modules.block.C3              [80, 80, 3]
  7                  -1  1    115520  ultralytics.nn.modules.conv.Conv             [80, 160, 3, 2]
  8                  -1  1    116160  ultralytics.nn.modules.block.C3              [160, 160, 1]
  9                  -1  1     64480  ultralytics.nn.modules.block.SPPF            [160, 160, 5]
 10                  -1  1     12960  ultralytics.nn.modules.conv.Conv             [160, 80, 1, 1]
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 12             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 13                  -1  1     35680  ultralytics.nn.modules.block.C3              [160, 80, 1, False]
 14                  -1  1      3280  ultralytics.nn.modules.conv.Conv             [80, 40, 1, 1]
 15                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 16             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 17                  -1  1      9040  ultralytics.nn.modules.block.C3              [80, 40, 1, False]
 18                  -1  1     14480  ultralytics.nn.modules.conv.Conv             [40, 40, 3, 2]
 19            [-1, 14]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 20                  -1  1     29280  ultralytics.nn.modules.block.C3              [80, 80, 1, False]
 21                  -1  1     57760  ultralytics.nn.modules.conv.Conv             [80, 80, 3, 2]
 22            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 23                  -1  1    116160  ultralytics.nn.modules.block.C3              [160, 160, 1, False]
 24        [17, 20, 23]  1    429739  ultralytics.nn.modules.head.Detect           [1, [40, 80, 160]]
tinyissimo-v5 summary: 262 layers, 1123339 parameters, 1123323 gradients

TensorBoard: Start with 'tensorboard --logdir results\exp17', view at http://localhost:6006/
Freezing layer 'model.24.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning D:\SmartLight.ai\HEIMDALL_LOCAL_SETUP\Data\train\labels.cache... 936 images, 465 backgrounds, 0 corrupt: 100%|██████████| 936/936 [00:00<?, ?it/s]
WARNING ⚠️ no model scale passed. Assuming scale='b'.
New https://pypi.org/project/ultralytics/8.2.36 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.1.29 🚀 Python-3.10.11 torch-2.0.0+cu117 CUDA:0 (NVIDIA GeForce GTX 1660 Ti, 6144MiB)
engine\trainer: task=detect, mode=train, model=./ultralytics/cfg/models/tinyissimo/tinyissimo-v5.yaml, data=../Data/data.yaml, epochs=50, time=None, patience=100, batch=64, imgsz=96, save=True, save_period=-1, cache=False, device=None, workers=8, project=results, name=exp18, exist_ok=False, pretrained=True, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=results\exp18
Overriding model.yaml nc=20 with nc=1
WARNING ⚠️ no model scale passed. Assuming scale='b'.

                   from  n    params  module                                       arguments
  0                  -1  1      1760  ultralytics.nn.modules.conv.Conv             [3, 16, 6, 2, 2]
  1                  -1  1      3504  ultralytics.nn.modules.conv.Conv             [16, 24, 3, 2]
  2                  -1  1      2736  ultralytics.nn.modules.block.C3              [24, 24, 1]
  3                  -1  1      8720  ultralytics.nn.modules.conv.Conv             [24, 40, 3, 2]
  4                  -1  2     11520  ultralytics.nn.modules.block.C3              [40, 40, 2]
  5                  -1  1     28960  ultralytics.nn.modules.conv.Conv             [40, 80, 3, 2]
  6                  -1  3     61600  ultralytics.nn.modules.block.C3              [80, 80, 3]
  7                  -1  1    115520  ultralytics.nn.modules.conv.Conv             [80, 160, 3, 2]
  8                  -1  1    116160  ultralytics.nn.modules.block.C3              [160, 160, 1]
  9                  -1  1     64480  ultralytics.nn.modules.block.SPPF            [160, 160, 5]
 10                  -1  1     12960  ultralytics.nn.modules.conv.Conv             [160, 80, 1, 1]
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 12             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 13                  -1  1     35680  ultralytics.nn.modules.block.C3              [160, 80, 1, False]
 14                  -1  1      3280  ultralytics.nn.modules.conv.Conv             [80, 40, 1, 1]
 15                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 16             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 17                  -1  1      9040  ultralytics.nn.modules.block.C3              [80, 40, 1, False]
 18                  -1  1     14480  ultralytics.nn.modules.conv.Conv             [40, 40, 3, 2]
 19            [-1, 14]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 20                  -1  1     29280  ultralytics.nn.modules.block.C3              [80, 80, 1, False]
 21                  -1  1     57760  ultralytics.nn.modules.conv.Conv             [80, 80, 3, 2]
 22            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 23                  -1  1    116160  ultralytics.nn.modules.block.C3              [160, 160, 1, False]
 24        [17, 20, 23]  1    429739  ultralytics.nn.modules.head.Detect           [1, [40, 80, 160]]
tinyissimo-v5 summary: 262 layers, 1123339 parameters, 1123323 gradients

TensorBoard: Start with 'tensorboard --logdir results\exp18', view at http://localhost:6006/
Freezing layer 'model.24.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning D:\SmartLight.ai\HEIMDALL_LOCAL_SETUP\Data\train\labels.cache... 936 images, 465 backgrounds, 0 corrupt: 100%|██████████| 936/936 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\TinyissimoYOLO\a_train_export.py", line 12, in <module>
    model.train(data="../Data/data.yaml",  project="results", name="exp", optimizer='SGD',  imgsz=img_size,  epochs=50,  batch=64)
  File "D:\TinyissimoYOLO\ultralytics\engine\model.py", line 655, in train
    self.trainer.train()
  File "D:\TinyissimoYOLO\ultralytics\engine\trainer.py", line 213, in train
    self._do_train(world_size)
  File "D:\TinyissimoYOLO\ultralytics\engine\trainer.py", line 327, in _do_train
    self._setup_train(world_size)
  File "D:\TinyissimoYOLO\ultralytics\engine\trainer.py", line 291, in _setup_train
    self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=RANK, mode="train")
  File "D:\TinyissimoYOLO\ultralytics\models\yolo\detect\train.py", line 55, in get_dataloader
    return build_dataloader(dataset, batch_size, workers, shuffle, rank)  # return dataloader
  File "D:\TinyissimoYOLO\ultralytics\data\build.py", line 114, in build_dataloader
    return InfiniteDataLoader(
  File "D:\TinyissimoYOLO\ultralytics\data\build.py", line 40, in __init__
    self.iterator = super().__iter__()
  File "D:\.venv3\lib\site-packages\torch\utils\data\dataloader.py", line 442, in __iter__
    return self._get_iterator()
  File "D:\.venv3\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "D:\.venv3\lib\site-packages\torch\utils\data\dataloader.py", line 1043, in __init__
    w.start()
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\Ana\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

It just hangs at the error, the only way to cancel the training process is by closing/killing the terminal.

mandulaj commented 4 weeks ago

Hey @GotRobbd,

Seems like this is an issue with the way Windows (which we don't usually use) handles subprocesses. See: https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing

In the a_train_export.py could you try doing as the error message suggests and wrapping everything in a def main(): and then calling this main from the if __name__ == '__main__' guard?

Something like this:

import torch 
from ultralytics import YOLO 

def main():
    device = torch.device("cuda")
    model_name = "./ultralytics/cfg/models/tinyissimo/tinyissimo-v8.yaml"
    model = YOLO(model_name) 

    img_size = 256
    input_size = (1, 1, img_size, img_size)  

    # Train
    model.train(data="VOC.yaml",  project="results", name="exp", optimizer='SGD',  imgsz=img_size,  epochs=1,  batch=64)

    # Export
    model.export(format="onnx", project="results", name="exp", imgsz=[img_size,img_size]) 

if __name__ == '__main__':
    main()

GotRobbd commented 4 weeks ago

Hey @mandulaj ,

I have just tested the code above and it works! Thank you so much!

It would be very helpful for others if this can be either noted or pushed into the repository to prevent such scenario.

mandulaj commented 4 weeks ago

Certainly, I think @mojulian can do that

ETH-PBL / TinyissimoYOLO

Cannot train model on Windows on CUDA #2