cc-ai / climategan

Code and pre-trained model for the algorithm generating visualisations of 3 climate change related events: floods, wildfires and smog.
https://thisclimatedoesnotexist.com
GNU General Public License v3.0
75 stars 18 forks source link

Training stage problem #218

Closed Jeffwavegis closed 1 year ago

Jeffwavegis commented 1 year ago

Training stage problem I want to train CalimateGAN, the I have some probelms... env: CUDA 10.1 python version 3.8.5 pytorch 1.7.0+cu101

Problems:

  1. torch.load() no work! on ~/climategan/data.py line 358, Can I use imread(path).astype(np.float32)?
    if task == "s":
        if domain == "kitti":
            return process_kitti_seg(
                path, classes_dict["kitti"], kitti_mapping, default=14
            )
        # return torch.load(path) 
        arr = imread(path).astype(np.float32)
  2. How much GPU memory do I need?
  3. I can't use the pretrain model. It's TODO issue.
    ~/share/trainer/defaults.yaml resume: false # Load latest_ckpt.pth checkpoint from `output_path` #TODO Make this path of checkpoint to load
vict0rsch commented 1 year ago
  1. It's ok, it all depends on how you inferred and stored the segmentations. We had them as torch .pt but if they are images for you, it's fine.
  2. I don't know. We used 48GB RTX8000 NVIDIA GPUs with a batch size of 4. I'm expecting this can be lowered down by:
    1. Using automatic mixed precision
    2. Using a smaller resolution (we used 640x640px)
  3. you can look at the pre-trained example in the Zip to download from Google Drive to update this. Basically you can resume, it's just that setting resume: true will resume from the output_path and you cannot specify something else in the current state of the code. There was this idea of resuming from another folder than output_path but we never needed it so we did not implement it
Jeffwavegis commented 1 year ago

For Nvidia GPUs, there is a tool nvidia-smi that can show memory usage For linux, using nvidia-smi -l 1 will continually give you the GPU usage info, within a refresh interval of 1 second.