allanj / Deductive-MWP

62 stars 19 forks source link

Do you have check points for MAWPS? #8

Open ssh1419 opened 2 years ago

ssh1419 commented 2 years ago

I would like to ask you if you can provide the check points of the model with MAWPS as well.

allanj commented 2 years ago

That would be five checkpoints as the experiments were conducted in five-fold. Do yo want that?

ssh1419 commented 2 years ago

Yes, it would be great if I could use that.

allanj commented 2 years ago

Sorry, I don't think I keep that (only the log files are available) as limited space. But I can try to run the experiments again for you.

ssh1419 commented 2 years ago

Oh, if that is possible, can you try it, please?

allanj commented 2 years ago

I'm currently on leave for 10 days. So probably update you later

ssh1419 commented 2 years ago

Thank you!

I am working on MathQA as well now and this is what I got. Can you have a look at it? I cannot figure out what the problem is.

11/05/2022 13:37:06 - INFO - main - device = cuda:0 11/05/2022 13:37:06 - INFO - main - batch_size = 30 11/05/2022 13:37:06 - INFO - main - train_num = -1 11/05/2022 13:37:06 - INFO - main - dev_num = -1 11/05/2022 13:37:06 - INFO - main - test_num = -1 11/05/2022 13:37:06 - INFO - main - train_file = data/math23k/train23k_processed_nodup.json 11/05/2022 13:37:06 - INFO - main - dev_file = data/math23k/valid23k_processed_nodup.json 11/05/2022 13:37:06 - INFO - main - test_file = data/MathQA/mathqa_test_nodup_our_filtered.json 11/05/2022 13:37:06 - INFO - main - train_filtered_steps = None 11/05/2022 13:37:06 - INFO - main - test_filtered_steps = None 11/05/2022 13:37:06 - INFO - main - seed = 42 11/05/2022 13:37:06 - INFO - main - model_folder = mathqa_roberta-base_gru 11/05/2022 13:37:06 - INFO - main - bert_folder = none 11/05/2022 13:37:06 - INFO - main - bert_model_name = roberta-base 11/05/2022 13:37:06 - INFO - main - height = 10 11/05/2022 13:37:06 - INFO - main - train_max_height = 15 11/05/2022 13:37:06 - INFO - main - var_update_mode = gru 11/05/2022 13:37:06 - INFO - main - mode = test 11/05/2022 13:37:06 - INFO - main - learning_rate = 2e-05 11/05/2022 13:37:06 - INFO - main - max_grad_norm = 1.0 11/05/2022 13:37:06 - INFO - main - num_epochs = 1000 11/05/2022 13:37:06 - INFO - main - fp16 = 1 11/05/2022 13:37:06 - INFO - main - parallel = 0 11/05/2022 13:37:06 - INFO - main - cut_off = -100 11/05/2022 13:37:06 - INFO - main - print_error = 0 11/05/2022 13:37:06 - INFO - main - error_file = results/error.json 11/05/2022 13:37:06 - INFO - main - result_file = results/res.json 11/05/2022 13:37:07 - INFO - main - [Data Info] constant info: {'1': 0, 'PI': 1} 11/05/2022 13:37:07 - INFO - main - Testing the model now. Some weights of UniversalModel_Roberta were not initialized from the model checkpoint at model_files/mathqa_roberta-base_gru and are newly initialized because the shapes did not match:

const_rep: found shape torch.Size([20, 768]) in the checkpoint and torch.Size([2, 768]) in the model instantiated

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

11/05/2022 13:37:14 - INFO - main - [Data Info] Reading test data

Tokenization: 0%| | 0/1605 [00:00<?, ?it/s]

Tokenization: 12%|█▏ | 196/1605 [00:00<00:00, 1957.17it/s][WARNING] find right var (100.0) invalid, returning FALSE [WARNING] find right var (1.0) invalid, returning FALSE [WARNING] find right var (2.0) invalid, returning FALSE [WARNING] find right var (100.0) invalid, returning FALSE [WARNING] find right var (10.0) invalid, returning FALSE [WARNING] find right var (100.0) invalid, returning FALSE [WARNING] find right var (1000.0) invalid, returning FALSE [WARNING] find right var (3600.0) invalid, returning FALSE [WARNING] find right var (100.0) invalid, returning FALSE [WARNING] find right var (1.0) invalid, returning FALSE [WARNING] find left_var (2.0) invalid, returning FALSE [WARNING] find right var (2.0) invalid, returning FALSE [WARNING] find right var (100.0) invalid, returning FALSE [WARNING] find right var (4.0) invalid, returning FALSE [WARNING] find right var (60.0) invalid, returning FALSE [WARNING] find right var (1000.0) invalid, returning FALSE [WARNING] find left_var (5.0) invalid, returning FALSE [WARNING] find right var (2.0) invalid, returning FALSE [WARNING] find right var (3.0) invalid, returning FALSE [WARNING] find right var (0.5) invalid, returning FALSE [WARNING] find right var (2.0) invalid, returning FALSE

...

Tokenization: 100%|██████████| 1605/1605 [00:00<00:00, 2300.72it/s] 11/05/2022 13:37:15 - INFO - src.data.universal_dataset - , total number instances: 468 (before filter: 1605), max num steps: 9 11/05/2022 13:37:15 - INFO - src.data.universal_dataset - filtered type counter: Counter({'cannot obtain the label sequence': 1131, 'larger than the max height 10': 6}) 11/05/2022 13:37:15 - INFO - src.data.universal_dataset - number of instances removed: 1137 11/05/2022 13:37:15 - WARNING - src.data.universal_dataset - [WARNING] find duplication num: 2 (not removed) 11/05/2022 13:37:15 - INFO - src.data.universal_dataset - Counter({3: 139, 2: 93, 1: 73, 4: 72, 5: 54, 6: 20, 7: 10, 8: 5, 9: 2}) [WARNING] find right var (100.0) invalid, returning FALSE [WARNING] find right var (3.6) invalid, returning FALSE [WARNING] find right var (2.0) invalid, returning FALSE [WARNING] find left_var (1.0) invalid, returning FALSE [WARNING] find right var (2.0) invalid, returning FALSE [WARNING] find right var (100.0) invalid, returning FALSE [WARNING] find right var (1.0) invalid, returning FALSE [WARNING] find right var (100.0) invalid, returning FALSE ...

--validation: 0%| | 0/16 [00:00<?, ?it/s] --validation: 6%|▋ | 1/16 [00:01<00:23, 1.55s/it] --validation: 19%|█▉ | 3/16 [00:01<00:06, 2.14it/s] --validation: 25%|██▌ | 4/16 [00:01<00:04, 2.91it/s] --validation: 38%|███▊ | 6/16 [00:02<00:02, 4.56it/s] --validation: 50%|█████ | 8/16 [00:02<00:01, 6.12it/s] --validation: 62%|██████▎ | 10/16 [00:02<00:00, 7.30it/s] --validation: 75%|███████▌ | 12/16 [00:02<00:00, 7.60it/s] --validation: 88%|████████▊ | 14/16 [00:02<00:00, 8.06it/s] --validation: 100%|██████████| 16/16 [00:02<00:00, 9.29it/s] --validation: 100%|██████████| 16/16 [00:02<00:00, 5.36it/s] 11/05/2022 13:37:18 - INFO - main - [Info] Equation accuracy: 23.49%, total: 468, corr: 377, adjusted_total: 1605 11/05/2022 13:37:18 - INFO - main - [Info] Value accuracy: 24.42%, total: 468, corr: 392, adjusted_total: 1605 11/05/2022 13:37:18 - INFO - main - [Info] step num: 3 Acc.:84.89 (118/139) val acc: 86.33 (120/139) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 7 Acc.:70.00 (7/10) val acc: 70.00 (7/10) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 4 Acc.:84.72 (61/72) val acc: 86.11 (62/72) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 2 Acc.:82.80 (77/93) val acc: 86.02 (80/93) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 1 Acc.:82.19 (60/73) val acc: 84.93 (62/73) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 5 Acc.:62.96 (34/54) val acc: 70.37 (38/54) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 6 Acc.:65.00 (13/20) val acc: 80.00 (16/20) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 9 Acc.:100.00 (2/2) val acc: 100.00 (2/2) 11/05/2022 13:37:18 - INFO - main - [Info] step num: 8 Acc.:100.00 (5/5) val acc: 100.00 (5/5)

allanj commented 2 years ago

what's the error that you are having?

ssh1419 commented 2 years ago

I tested the model with the checkpoints of MathQA and as you can see that 'tokenization' does not work well and accuracies are very low. Actually it worked completely well with Math23k(train/dev/test setting). Do you have any idea why it did not work with MathQA checkpoints?