How to obtain the training time of each architecture?

D-X-Y / AutoDL-Projects

Automated deep learning algorithms implemented in PyTorch.

MIT License

1.56k stars 281 forks source link

How to obtain the training time of each architecture? #55

Closed zhengyu998 closed 4 years ago

zhengyu998 commented 4 years ago

@D-X-Y Hi! I tried both NAS-Bench-201-v1_0-e61699.pth and 226G full dataset, using either info.get_metrics('cifar10', 'train') and archRes.get_metrics('cifar10-valid', 'x-valid', None, False) returned:

{'iepoch': 199, 'loss': 2.3029543669128416, 'accuracy': 9.711999997863769, 'cur_time': None, 'all_time': None}

D-X-Y commented 4 years ago

The wall clock time for all settings can be estimated based on all time information in the setting of “use12epoch=True with cifar10-valid” and the latency time in all experiments. Please see here (https://github.com/D-X-Y/AutoDL-Projects/blob/master/exps/algos/R_EA.py#L62) for an example.

I will re-organize the benchmark api to make it easier to access the time data.

crwhite14 commented 4 years ago

Hi, I am still trying to get the training times. I tried to run the code you linked above, but this line gives an error in the api: info = api.get_more_info(index, dataname, 25, False, True)

~/Library/Python/3.7/lib/python/site-packages/nas_201_api/api.py in get_more_info(self, index, dataset, iepoch, use_12epochs_result, is_random)
    291     xinfo = {'train-loss'    : train_info['loss'],
    292              'train-accuracy': train_info['accuracy'],
--> 293              'train-per-time': train_info['all_time'] / total,
    294              'train-all-time': train_info['all_time']}
    295     # collect the evaluation information

TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'

I just want to know the average time to train an architecture for 200 epochs on cifar10, cifar100, and ImageNet16-120. Thanks.

D-X-Y commented 4 years ago

May I ask that are you using the benchmark file of NAS-Bench-201-v1_0-e61699.pth or NAS-Bench-201-v1_1-096897.pth? I will take into this problem.

D-X-Y commented 4 years ago

@crwhite14 Hi, I have tried "info = api.get_more_info(100, 'cifar100', 199, False, True)" with NAS-Bench-201-v1_1-096897.pth. It works. Would you mind to share more details?

crwhite14 commented 4 years ago

Hi, thanks for your comments. I was using NAS-Bench-201-v1_0-e61699.pth. Even with this version, I was able to access the runtimes by using an older version of train_and_eval() in https://github.com/D-X-Y/AutoDL-Projects/blob/master/exps/algos/R_EA.py.

D-X-Y commented 4 years ago

Cool. I would recommend using NAS-Bench-201-v1_1-096897.pth which contains more data than NAS-Bench-201-v1_0-e61699.pth.