I run this model on the ibp dataset and it gets killed when loading the training data.My GPU is 2070s and environment is pytorch1.3 devel. Does anybody know how to deal with the problem?
My command:
python main.py --exp_name first_train --fp16 true --amp 2 --tasks "prim_ibp" --reload_data "prim_ibp,prim_ibp.train,prim_ibp.valid,prim_ibp.test" --reload_size 40000000 --emb_dim 1024 --n_enc_layers 6 --n_dec_layers 6 --n_heads 8 --optimizer "adam,lr=0.0001" --batch_size 32 --epoch_size 300000 --validation_metrics valid_prim_fwd_acc
Reaction:INFO - 07/02/20 06:56:43 - 0:00:00 - The experiment will be stored in ./dumped/first_train/dgv5zq039m
I run this model on the ibp dataset and it gets killed when loading the training data.My GPU is 2070s and environment is pytorch1.3 devel. Does anybody know how to deal with the problem? My command: python main.py --exp_name first_train --fp16 true --amp 2 --tasks "prim_ibp" --reload_data "prim_ibp,prim_ibp.train,prim_ibp.valid,prim_ibp.test" --reload_size 40000000 --emb_dim 1024 --n_enc_layers 6 --n_dec_layers 6 --n_heads 8 --optimizer "adam,lr=0.0001" --batch_size 32 --epoch_size 300000 --validation_metrics valid_prim_fwd_acc Reaction:INFO - 07/02/20 06:56:43 - 0:00:00 - The experiment will be stored in ./dumped/first_train/dgv5zq039m
INFO - 07/02/20 06:56:43 - 0:00:00 - Running command: python main.py --exp_name first_train --fp16 true --amp 2 --tasks prim_ibp --reload_data 'prim_ibp,prim_ibp.train,prim_ibp.valid,prim_ibp.test' --reload_size '-1' --emb_dim 128 --n_enc_layers 1 --n_dec_layers 1 --n_heads 1 --optimizer 'adam,lr=0.0001' --batch_size 8 --epoch_size 300000 --validation_metrics valid_prim_fwd_acc
WARNING - 07/02/20 06:56:43 - 0:00:00 - Signal handler installed. INFO - 07/02/20 06:56:43 - 0:00:00 - Unary operators: [] INFO - 07/02/20 06:56:43 - 0:00:00 - Binary operators: ['add', 'sub'] INFO - 07/02/20 06:56:43 - 0:00:00 - words: {'': 2, '(': 3, ')': 4, '': 5, '': 6, '': 7, '': 8, '': 9, 'pi': 10, 'E': 11, 'x': 12, 'y': 13, 'z': 14, 't': 15, 'a0': 16, 'a1': 17, 'a2': 18, 'a3': 19, 'a4': 20, 'a5': 21, 'a6': 22, 'a7': 23, 'a8': 24, 'a9': 25, 'abs': 26, 'acos': 27, 'acosh': 28, 'acot': 29, 'acoth': 30, 'acsc': 31, 'acsch': 32, 'add': 33, 'asec': 34, 'asech': 35, 'asin': 36, 'asinh': 37, 'atan': 38, 'atanh': 39, 'cos': 40, 'cosh': 41, 'cot': 42, 'coth': 43, 'csc': 44, 'csch': 45, 'derivative': 46, 'div': 47, 'exp': 48, 'f': 49, 'g': 50, 'h': 51, 'inv': 52, 'ln': 53, 'mul': 54, 'pow': 55, 'pow2': 56, 'pow3': 57, 'pow4': 58, 'pow5': 59, 'rac': 60, 'sec': 61, 'sech': 62, 'sign': 63, 'sin': 64, 'sinh': 65, 'sqrt': 66, 'sub': 67, 'tan': 68, 'tanh': 69, 'I': 70, 'INT+': 71, 'INT-': 72, 'INT': 73, 'FLOAT': 74, '-': 75, '.': 76, '10^': 77, 'Y': 78, "Y'": 79, "Y''": 80, '0': 81, '1': 82, '2': 83, '3': 84, '4': 85, '5': 86, '6': 87, '7': 88, '8': 89, '9': 90}
INFO - 07/02/20 06:56:43 - 0:00:00 - 20001 possible leaves.
INFO - 07/02/20 06:56:43 - 0:00:00 - Checking expressions in [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1.1, 2.1, 3.1, -0.01, -0.1, -0.3, -0.5, -0.7, -0.9, -1.1, -2.1, -3.1]
INFO - 07/02/20 06:56:43 - 0:00:00 - Training tasks: prim_ibp
INFO - 07/02/20 06:56:43 - 0:00:00 - Number of parameters (encoder): 734464
INFO - 07/02/20 06:56:43 - 0:00:00 - Number of parameters (decoder): 800859
INFO - 07/02/20 06:56:47 - 0:00:03 - Found 51 parameters in model.
INFO - 07/02/20 06:56:47 - 0:00:03 - Optimizers: model
Selected optimization level O2: FP16 training with FP32 batchnorm and FP32 master weights.
': 0, '': 1, 'Defaults for this optimization level are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic INFO - 07/02/20 06:56:47 - 0:00:03 - Creating train iterator for prim_ibp ... INFO - 07/02/20 06:56:47 - 0:00:03 - Loading data from prim_ibp.train ... Killed