train & test environment

janghak commented 5 years ago

Hi, I tried to run the train_model in rt_gene_model_training, but it crashed with "MemoryError" when moving on to the 2nd fold.

Error message shows: Traceback (most recent call last): File "train_model.py", line 88, in train_images_L, train_images_R, train_gazes, train_headposes, train_num = get_train_test_data_twoeyes(train_files, 'train') File "/home/janghak/HDD/workspace/github/rt_gene/rt_gene_model_training/train_tools.py", line 171, in get_train_test_data_twoeyes images_r = np.vstack([files[idx][label]['imagesR'] for idx in range(len(files))]) File "/home/janghak/anaconda3/envs/ge/lib/python3.7/site-packages/numpy/core/shape_base.py", line 283, in vstack return _nx.concatenate([atleast_2d(_m) for _m in tup], 0) MemoryError

I have tried running with batch size of 4, still crashing. I am using cuda 9.1 and want to know if matching the cuda and cudnn version would solve the problem.

Thank you.

Tobias-Fischer commented 5 years ago

Hi, This error indicates you are running out of "normal" RAM and has nothing to do with your GPU/cuda. Do you have another machine available with more memory? The train code should be rewritten to not load the whole dataset at once, however this is not a priority ..

Best, Tobias

janghak commented 5 years ago

Thank you. It solved the problem. It crashed on 24G, worked on 64G.

Tobias-Fischer / rt_gene

train & test environment #29