Training stage problem - Githubissues

Jeffwavegis commented 1 year ago

Training stage problem I want to train CalimateGAN, the I have some probelms... env: CUDA 10.1 python version 3.8.5 pytorch 1.7.0+cu101

Problems:

torch.load() no work! on ~/climategan/data.py line 358, Can I use imread(path).astype(np.float32)?

if task == "s":
    if domain == "kitti":
        return process_kitti_seg(
            path, classes_dict["kitti"], kitti_mapping, default=14
        )
    # return torch.load(path) 
    arr = imread(path).astype(np.float32)

How much GPU memory do I need?
I can't use the pretrain model. It's TODO issue.
~/share/trainer/defaults.yaml resume: false # Load latest_ckpt.pth checkpoint from `output_path` #TODO Make this path of checkpoint to load

vict0rsch commented 1 year ago

It's ok, it all depends on how you inferred and stored the segmentations. We had them as torch .pt but if they are images for you, it's fine.
I don't know. We used 48GB RTX8000 NVIDIA GPUs with a batch size of 4. I'm expecting this can be lowered down by:
1. Using automatic mixed precision
2. Using a smaller resolution (we used 640x640px)
you can look at the pre-trained example in the Zip to download from Google Drive to update this. Basically you can resume, it's just that setting resume: true will resume from the output_path and you cannot specify something else in the current state of the code. There was this idea of resuming from another folder than output_path but we never needed it so we did not implement it

Jeffwavegis commented 1 year ago

For Nvidia GPUs, there is a tool nvidia-smi that can show memory usage For linux, using nvidia-smi -l 1 will continually give you the GPU usage info, within a refresh interval of 1 second.

cc-ai / climategan

Training stage problem #218