facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.48k stars 898 forks source link

Issue with fine tuning of model with trained data #106

Closed pradeepm2017 closed 6 years ago

pradeepm2017 commented 6 years ago

Hi, I have installed DrQA model. I have tried using default model on the set of documents. It is working without any errors although not giving accurate answers. I wanted to fine tune the model for my own data to improve accuracy of the responses. It led following issues. Please help me solve this issue:

03/28/2018 06:22:01 AM: [ COMMAND: /apps1/a622107/DrQA/scripts/reader/train.py --tune-partial 1000 --train-file pam_doc_train-processed-corenlp.txt --dev-file pam_doc_dev-processed-corenlp.txt --checkpoint True --pretrained /apps1/a622107/DrQA/data/reader/multitask.mdl --embedding-file glove.840B.300d.txt ] 03/28/2018 06:22:01 AM: [ ---------------------------------------------------------------------------------------------------- ] 03/28/2018 06:22:01 AM: [ Load data files ] 03/28/2018 06:22:01 AM: [ Num train examples = 3 ] 03/28/2018 06:22:01 AM: [ Num dev examples = 20 ] 03/28/2018 06:22:01 AM: [ ---------------------------------------------------------------------------------------------------- ] 03/28/2018 06:22:01 AM: [ Using pretrained model... ] 03/28/2018 06:22:01 AM: [ Loading model /apps1/a622107/DrQA/data/reader/multitask.mdl ] 03/28/2018 06:22:01 AM: [ Keeping saved use_ner: False ] 03/28/2018 06:22:01 AM: [ Overriding saved dropout_rnn: 0.35 --> 0.4 ] 03/28/2018 06:22:01 AM: [ Keeping saved use_lemma: False ] 03/28/2018 06:22:01 AM: [ Overriding saved dropout_emb: 0.35 --> 0.4 ] 03/28/2018 06:22:01 AM: [ Keeping saved use_pos: False ] 03/28/2018 06:22:07 AM: [ ---------------------------------------------------------------------------------------------------- ] 03/28/2018 06:22:07 AM: [ Counting 1000 most frequent question words ] 03/28/2018 06:22:07 AM: [ ('?', 3) ] 03/28/2018 06:22:07 AM: [ ('What', 2) ] 03/28/2018 06:22:07 AM: [ ('compensation', 1) ] 03/28/2018 06:22:07 AM: [ ('is', 1) ] 03/28/2018 06:22:07 AM: [ ('classes', 1) ] 03/28/2018 06:22:07 AM: [ ... ] 03/28/2018 06:22:07 AM: [ ('selected', 1) ] 03/28/2018 06:22:07 AM: [ ('Is', 1) ] 03/28/2018 06:22:07 AM: [ ('exclusions', 1) ] 03/28/2018 06:22:07 AM: [ ('are', 1) ] 03/28/2018 06:22:07 AM: [ ('excluded', 1) ] 03/28/2018 06:22:07 AM: [ ---------------------------------------------------------------------------------------------------- ] 03/28/2018 06:22:07 AM: [ Make data loaders ] 03/28/2018 06:22:07 AM: [ ---------------------------------------------------------------------------------------------------- ] 03/28/2018 06:22:07 AM: [ CONFIG: { "batch_size": 32, "checkpoint": true, "concat_rnn_layers": true, "cuda": false, "data_dir": "/apps1/a622107/DrQA/data/datasets", "data_workers": 5, "dev_file": "/apps1/a622107/DrQA/data/datasets/pam_doc_dev-processed-corenlp.txt", "dev_json": "/apps1/a622107/DrQA/data/datasets/SQuAD-v1.1-dev.json", "display_iter": 25, "doc_layers": 3, "dropout_emb": 0.4, "dropout_rnn": 0.4, "dropout_rnn_output": true, "embed_dir": "/apps1/a622107/DrQA/data/embeddings", "embedding_dim": 300, "embedding_file": "/apps1/a622107/DrQA/data/embeddings/glove.840B.300d.txt", "expand_dictionary": false, "fix_embeddings": false, "gpu": -1, "grad_clipping": 10, "hidden_size": 128, "learning_rate": 0.1, "log_file": "/tmp/drqa-models/20180328-689c83e5.txt", "max_len": 15, "model_dir": "/tmp/drqa-models/", "model_file": "/tmp/drqa-models/20180328-689c83e5.mdl", "model_name": "20180328-689c83e5", "model_type": "rnn", "momentum": 0, "no_cuda": false, "num_epochs": 40, "official_eval": true, "optimizer": "adamax", "parallel": false, "pretrained": "/apps1/a622107/DrQA/data/reader/multitask.mdl", "question_layers": 3, "question_merge": "self_attn", "random_seed": 1013, "restrict_vocab": true, "rnn_padding": false, "rnn_type": "lstm", "sort_by_len": true, "test_batch_size": 128, "train_file": "/apps1/a622107/DrQA/data/datasets/pam_doc_train-processed-corenlp.txt", "tune_partial": 1000, "uncased_doc": false, "uncased_question": false, "use_in_question": true, "use_lemma": true, "use_ner": true, "use_pos": true, "use_qemb": true, "use_tf": true, "valid_metric": "f1", "weight_decay": 0 } ] 03/28/2018 06:22:07 AM: [ ---------------------------------------------------------------------------------------------------- ] 03/28/2018 06:22:07 AM: [ Starting training... ] Traceback (most recent call last): File "/apps1/a622107/DrQA/scripts/reader/train.py", line 546, in main(args) File "/apps1/a622107/DrQA/scripts/reader/train.py", line 485, in main train(args, train_loader, model, stats) File "/apps1/a622107/DrQA/scripts/reader/train.py", line 215, in train train_loss.update(*model.update(ex)) File "/apps1/a622107/DrQA/drqa/reader/model.py", line 233, in update self.reset_parameters() File "/apps1/a622107/DrQA/drqa/reader/model.py", line 251, in reset_parameters embedding[offset:] = fixed_embedding RuntimeError: inconsistent tensor size, expected tensor [213389 x 300] and src [214375 x 300] to have the same number of elements, but got 64016700 and 64312500 elements respectively at /opt/conda/conda-bld/pytorch_1503963423183/work/torch/lib/TH/generic/THTensorCopy.c:86

Please let me know how to solve it.

Thanks,

ajfisch commented 6 years ago

Sorry for the delay. Indeed there is a bug here -- tune-partial is set to 1000, but there are fewer than 1000 tokens in your questions. I'll fix this.