RAM error - Githubissues

ecker-lab / sensorium_2023

repo for the sensorium 2023 competition

24 stars 13 forks source link

RAM error #5

Open GowthamS07 opened 1 year ago

GowthamS07 commented 1 year ago

I am trying to run the models_demo.py code but whenever I try to run the first model that is gru_2d_model the whole code crashes saying it utilised all the RAM, I use Colab Pro which has a RAM of 25 GB, tried to reducing the Channels from 32 to 16 and batch size from 16 to 12 but still the same issue.

Is there any minimum RAM requirements to run this code? I don't exactly know what image or code to add for extra details so please let me know if anything to be added, ill be happy to

pollytur commented 1 year ago

the batch size could be even smaller (like 8, this usually does not usually degrades the performance too much) you could also use less then 150 frames in the dataloaders, this would noticeable save the memory as well As a rough estimation - for batch size 8 and 32 channels we used the 24Gb GPU and it worked

Hope this helps

GowthamS07 commented 1 year ago

Screenshot (56)

I have reduced the batch size to 2, and the Frames are just 60 and even though explicitly mentioned to move model and data to GPU, as you can see the GPU is never being used only the CPU RAM is being used

pollytur commented 1 year ago

could you please confirm that you mentioned cuda or cuda:0 in both dataset config and trainer config?

data_loaders = mouse_video_loader(
    paths=paths,
    batch_size=8,
    scale=1,
    max_frame=None,
    frames=60, # frames has to be > 50. If it fits on your gpu, we recommend 150
    offset=-1,
    include_behavior=True,
    include_pupil_centers=True,
    cuda='cuda:0',
)

and

trainer_config = {
    'dataloaders' : data_loaders,
    'seed' : 111,
    'use_wandb' : False,
    'verbose': True,
    'lr_decay_steps': 4,
    'lr_init': 0.005,
    'device' : "cuda:0",
    'detach_core' : False,
    'deeplake_ds' : False,
                 }

Dauriel commented 1 year ago

I am encountering the same issue. Device is set to 'cuda' in both trainer config and dataset config, batch size is minimal, but ram leaks somewhere and it keeps increasing as training progresses until it inevitably crashes.