facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.44k stars 2.43k forks source link

Reproducibility #313

Open Dicko87 opened 3 years ago

Dicko87 commented 3 years ago

Hi folks, I am struggling with getting repeatable result for a DETR model I am running with pytorch. I have defined the following function: import os from numpy import random import numpy as np def seed_torch(seed=42): random.seed(seed) os.environ['PYTHONHASHSEED'] = str(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed(seed) torch.cuda.manual_seed_all(seed) # if you are using multi-GPU. torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = True Please note that I have also tried this function with cudnn_benchmark set to false. This is the folder structure for DETR and the py files contained within it. image

I have tried using the seed function in each file separately to see if anyone of them would make my results reproducible and this did not work. I then tried using the seed function in all files in each folder separately and this did not work. I tried using the seed function in the first three folders and the first epoch was repeatable but after the first epoch the numbers were different. The numbers I am talking about look like this: image

But I am comparing losses for now. In the end I copied the seed function to every py file and then used the function in every class and definition in the py files and the results are still not reproducible. Does anybody have any idea how to resolve this? I am running my code in the Anaconda terminal.

alcinos commented 3 years ago

You should probably disable cudnn benchmark: torch.backends.cudnn.benchmark = False Did you investigate at which point the difference start to appear?

helq2612 commented 1 year ago

Hi @Dicko87 , have you solved this problem?

Dicko87 commented 1 year ago

No sorry, how about you?

Best Regards Carl Dickinson

On 14 Feb 2023, at 19:40, helq2612 @.***> wrote:

 CAUTION: This email originated outside the University. Check before clicking links or attachments.

Hi @Dicko87https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDicko87&data=05%7C01%7Ccarl.dickinson%40strath.ac.uk%7Ca43e4d2c8cb648d7659e08db0ec357ae%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C638120004371123183%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5aIVjJN0VUx70WaVnPQCm5loZ0mJkqoyrR6PJa9smTU%3D&reserved=0 , have you solved this problem?

— Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffacebookresearch%2Fdetr%2Fissues%2F313%23issuecomment-1430281692&data=05%7C01%7Ccarl.dickinson%40strath.ac.uk%7Ca43e4d2c8cb648d7659e08db0ec357ae%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C638120004371123183%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zdMYDmNXe5cE9cKknvva3IqXIFLUZhiJPtnDChHzE2k%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAI3LL5QDYAL5UM3MNJU5AWTWXPNTFANCNFSM4VXLD2MQ&data=05%7C01%7Ccarl.dickinson%40strath.ac.uk%7Ca43e4d2c8cb648d7659e08db0ec357ae%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C638120004371123183%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MGkeBSnnzC1SfsnwTpLBlx0t7ifixm0VuETD5pnjLog%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

helq2612 commented 1 year ago

I checked with https://pytorch.org/docs/stable/notes/randomness.html and https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html

I guess some operations are non-deterministic, such as bmm.