facebookresearch / SymbolicMathematics

Deep Learning for Symbolic Mathematics
Other
527 stars 116 forks source link

Process get killed #9

Open zqx1609 opened 4 years ago

zqx1609 commented 4 years ago

I run this model on the ibp dataset and it gets killed when loading the training data.My GPU is 2070s and environment is pytorch1.3 devel. Does anybody know how to deal with the problem? My command: python main.py --exp_name first_train --fp16 true --amp 2 --tasks "prim_ibp" --reload_data "prim_ibp,prim_ibp.train,prim_ibp.valid,prim_ibp.test" --reload_size 40000000 --emb_dim 1024 --n_enc_layers 6 --n_dec_layers 6 --n_heads 8 --optimizer "adam,lr=0.0001" --batch_size 32 --epoch_size 300000 --validation_metrics valid_prim_fwd_acc Reaction:INFO - 07/02/20 06:56:43 - 0:00:00 - The experiment will be stored in ./dumped/first_train/dgv5zq039m

INFO - 07/02/20 06:56:43 - 0:00:00 - Running command: python main.py --exp_name first_train --fp16 true --amp 2 --tasks prim_ibp --reload_data 'prim_ibp,prim_ibp.train,prim_ibp.valid,prim_ibp.test' --reload_size '-1' --emb_dim 128 --n_enc_layers 1 --n_dec_layers 1 --n_heads 1 --optimizer 'adam,lr=0.0001' --batch_size 8 --epoch_size 300000 --validation_metrics valid_prim_fwd_acc

WARNING - 07/02/20 06:56:43 - 0:00:00 - Signal handler installed. INFO - 07/02/20 06:56:43 - 0:00:00 - Unary operators: [] INFO - 07/02/20 06:56:43 - 0:00:00 - Binary operators: ['add', 'sub'] INFO - 07/02/20 06:56:43 - 0:00:00 - words: {'': 0, '': 1, '': 2, '(': 3, ')': 4, '': 5, '': 6, '': 7, '': 8, '': 9, 'pi': 10, 'E': 11, 'x': 12, 'y': 13, 'z': 14, 't': 15, 'a0': 16, 'a1': 17, 'a2': 18, 'a3': 19, 'a4': 20, 'a5': 21, 'a6': 22, 'a7': 23, 'a8': 24, 'a9': 25, 'abs': 26, 'acos': 27, 'acosh': 28, 'acot': 29, 'acoth': 30, 'acsc': 31, 'acsch': 32, 'add': 33, 'asec': 34, 'asech': 35, 'asin': 36, 'asinh': 37, 'atan': 38, 'atanh': 39, 'cos': 40, 'cosh': 41, 'cot': 42, 'coth': 43, 'csc': 44, 'csch': 45, 'derivative': 46, 'div': 47, 'exp': 48, 'f': 49, 'g': 50, 'h': 51, 'inv': 52, 'ln': 53, 'mul': 54, 'pow': 55, 'pow2': 56, 'pow3': 57, 'pow4': 58, 'pow5': 59, 'rac': 60, 'sec': 61, 'sech': 62, 'sign': 63, 'sin': 64, 'sinh': 65, 'sqrt': 66, 'sub': 67, 'tan': 68, 'tanh': 69, 'I': 70, 'INT+': 71, 'INT-': 72, 'INT': 73, 'FLOAT': 74, '-': 75, '.': 76, '10^': 77, 'Y': 78, "Y'": 79, "Y''": 80, '0': 81, '1': 82, '2': 83, '3': 84, '4': 85, '5': 86, '6': 87, '7': 88, '8': 89, '9': 90} INFO - 07/02/20 06:56:43 - 0:00:00 - 20001 possible leaves. INFO - 07/02/20 06:56:43 - 0:00:00 - Checking expressions in [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1.1, 2.1, 3.1, -0.01, -0.1, -0.3, -0.5, -0.7, -0.9, -1.1, -2.1, -3.1] INFO - 07/02/20 06:56:43 - 0:00:00 - Training tasks: prim_ibp INFO - 07/02/20 06:56:43 - 0:00:00 - Number of parameters (encoder): 734464 INFO - 07/02/20 06:56:43 - 0:00:00 - Number of parameters (decoder): 800859 INFO - 07/02/20 06:56:47 - 0:00:03 - Found 51 parameters in model. INFO - 07/02/20 06:56:47 - 0:00:03 - Optimizers: model Selected optimization level O2: FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic INFO - 07/02/20 06:56:47 - 0:00:03 - Creating train iterator for prim_ibp ... INFO - 07/02/20 06:56:47 - 0:00:03 - Loading data from prim_ibp.train ... Killed

AegisAurora commented 3 years ago

Hi, you might be running out of ram. Try smaller training set size.