decile-team / cords

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.
https://cords.readthedocs.io/en/latest/
MIT License
316 stars 53 forks source link

No such file or directory: '/mnt/data/cifar100_clip-ViT-L-14_fl_0.1_global_order.pkl' #84

Closed wgcban closed 1 year ago

wgcban commented 1 year ago

Hello, how to find the 'cifar100_clip-ViT-L-14_fl_0.1_global_order.pkl' file?

krishnatejakk commented 1 year ago

One needs to generate the pickle files using the following functions:

https://github.com/decile-team/cords/blob/de271cb9616cf77413dfc80eb12e606d274dc525/cords/utils/data/data_utils/generate_global_order.py#L838

Similarly, one can generate the stochastic subsets pickle file using the following function:

https://github.com/decile-team/cords/blob/de271cb9616cf77413dfc80eb12e606d274dc525/cords/utils/data/data_utils/generate_global_order.py#L917

A code snippet calling the generate_image_global_order function is as follows:


from cords.utils.data.data_utils import generate_image_global_order 
kw_list = [0.01]#, 0.05, 0.1, 0.5, 1]
r2_list = [2] #, 1.75, 1.5, 1.25, 1]
knn_list = [5] #, 15, 25, 50, 75]

for model in ["dino_cls"]:
    for submod_function in ['disp_min_pc']:
        for metric in ['cossim']:
            for i in range(len(kw_list)):
                for dset in ['cifar10']:
                    global_order, global_knn, global_r2, _ = generate_image_global_order(dset, model, submod_function, metric=metric, kw=kw_list[i], r2_coefficient=r2_list[i], knn=knn_list[i], seed=42, data_dir='../data', device='cuda')     

The arguments for each function should be self-explanatory. I will add a detailed documentation on how to generate the pickle files.