aaron-xichen / pytorch-playground

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
MIT License
2.6k stars 607 forks source link

Encounter "Memory Error" when converting imagenet dataset #18

Closed jeff830107 closed 4 years ago

jeff830107 commented 6 years ago

Hi, When I was trying to using the Alexnet model, I first of all tried to follow your instruction to download val224_compressed.pkl and executed the command "python convert.py" But when I was converting, it always come to the error message "Memory Error". I am curious about how to deal with this issue, since I think the memory of the machine I used is big enough, which is 64 GB. Thanks !

gdjmck commented 6 years ago

I ran into the same problem as well, the 244244 file was dumped okay with 7.5G and the 299299 pkl file was empty with 0B

jnorwood commented 6 years ago

I saw the same issue. I separated the 224 and 299 dump processing loops and cleared variables that were no longer used. Still it dies in dump_pickle, which must be making another copy. So, I looked around and found that scikit learn has a joblib.dump that can replace pkl.dump in dump_pickle, and it doesn't use as much memory while writing out the files. I think you'll still need to separate the 224 and 299 processing, as mine was running out of 32G memory while doing a transpose.... too many copies of the same data going on. With joblib, memory use goes up to 27G, and no error. This could probably use a db, instead of all this image info in a dict.

VShawn commented 4 years ago

same problem on 24G RAM windows PC with python 3.6.6 and torch 1.1.0


I finished my job by following convert.py, thx to @jnorwood

this new convert.py will takes about 16Gb memory.



import os import numpy as np import tqdm from utee import misc import argparse import cv2 import joblib

imagenet_urls = [ 'http://ml.cs.tsinghua.edu.cn/~chenxi/dataset/val224_compressed.pkl' ] parser = argparse.ArgumentParser(description='Extract the ILSVRC2012 val dataset') parser.add_argument('--in_file', default='val224_compressed.pkl', help='input file path') parser.add_argument('--out_root', default='/tmp/public_dataset/pytorch/imagenet-data/', help='output file path') args = parser.parse_args()

d = misc.load_pickle(args.in_file) assert len(d['data']) == 50000, len(d['data']) assert len(d['target']) == 50000, len(d['target'])

''' conver val224.pkl ''' data = [] for img, target in tqdm.tqdm(zip(d['data'], d['target']), total=50000): img224 = misc.str2img(img) data.append(img224) data_dict = dict( data = np.array(data).transpose(0, 3, 1, 2), target = d['target'] ) if not os.path.exists(args.out_root): os.makedirs(args.out_root) ''' misc.dump_pickle(data_dict, os.path.join(args.out_root, 'val224.pkl'))''' joblib.dump(data_dict, os.path.join(args.out_root, 'val224.pkl')) data_dict.clear() data.clear() print('val224.pkl done.')

''' conver val229.pkl ''' data = [] for img, target in tqdm.tqdm(zip(d['data'], d['target']), total=50000): img224 = misc.str2img(img) img299 = cv2.resize(img224, (299, 299)) data.append(img299) data_dict = dict( data = np.array(data).transpose(0, 3, 1, 2), target = d['target'] )

if not os.path.exists(args.out_root): os.makedirs(args.out_root) ''' misc.dump_pickle(data_dict, os.path.join(args.out_root, 'val299.pkl')) ''' joblib.dump(data_dict, os.path.join(args.out_root, 'val299.pkl')) data_dict.clear() data.clear() print('val299.pkl done.')



Loading pickle object from val224_compressed.pkl => Done (1.0991 s) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50000/50000 [01:02<00:00, 798.99it/s] val299.pkl done.

aaron-xichen commented 4 years ago

thanks @jnorwood, fixed, please check