allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.77k stars 2.25k forks source link

Can train or tune not a single model #3228

Closed lyriccoder closed 5 years ago

lyriccoder commented 5 years ago

I can train or tune not a single model via Python code. I tried different configs, different pretrained models.

To Reproduce

  1. Load fine-grained-ner-model-elmo-2018.12.21.tar.gz via Python code
  2. Choose the config which is located in the archive
  3. run train() function
  4. I see the error:
[lyriccoder@xdata5 project_location]$ python TempResearchTraint.py
exists path /home/lyriccoder/project_location/eng.train/eng.train.txt True?
14041it [00:01, 8113.32it/s]
exists path /home/lyriccoder/project_location/eng.train/eng.train.txt True?
3250it [00:00, 6342.00it/s]
exists path /home/lyriccoder/project_location/eng.train/eng.train.txt True?
3453it [00:00, 6370.38it/s]
Traceback (most recent call last):
  File "/home/lyriccoder/anaconda3/lib/python3.6/site-packages/allennlp/common/params.py", line 258, in pop
    value = self.params.pop(key)
KeyError: 'optimizer'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "TempResearchTraint.py", line 382, in <module>
    validation_iterator=None)
  File "/home/lyriccoder/anaconda3/lib/python3.6/site-packages/allennlp/training/trainer.py", line 688, in from_params
    optimizer = Optimizer.from_params(parameters, params.pop("optimizer"))
  File "/home/lyriccoder/anaconda3/lib/python3.6/site-packages/allennlp/common/params.py", line 260, in pop
    raise ConfigurationError("key \"{}\" is required at location \"{}\"".format(key, self.history))
allennlp.common.checks.ConfigurationError: 'key "optimizer" is required at location ""'

Expected behavior Training must begin

System:

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz Stepping: 4 CPU MHz: 1207.273 BogoMIPS: 5205.48 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31

I tried to use both configs:

Also I tried to use the different model: model-2018.12.18.tar.gz. I saw the same error with that model too. Also, I tried to train a model manually with a command-line interface. It was trained successfully. And when I tried to tune it or to train again, I got the above-mentioned error.

Do I make smth wrong?

Here is the code I used:

from allennlp.modules.seq2seq_encoders import PytorchSeq2SeqWrapper
from allennlp.predictors.predictor import Predictor
import torch
from typing import *
import itertools
import numpy as np
from allennlp.common import Params

import os
import logging
from allennlp.modules.token_embedders import Embedding
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
from allennlp.training.trainer import Trainer
from allennlp.modules.seq2vec_encoders import PytorchSeq2VecWrapper
from allennlp.data.token_indexers.elmo_indexer import ELMoTokenCharactersIndexer
from allennlp.nn import util as nn_util
from allennlp.data.iterators import DataIterator
from tqdm import tqdm
from scipy.special import expit  # the sigmoid function
from overrides import overrides
from allennlp.data.iterators import BasicIterator
from allennlp.common.checks import ConfigurationError
from allennlp.common.file_utils import cached_path
from allennlp.data.dataset_readers.dataset_reader import DatasetReader
from allennlp.data.dataset_readers.dataset_utils import to_bioul
from allennlp.data.fields import TextField, SequenceLabelField, Field, MetadataField, LabelField
from allennlp.data.instance import Instance
from allennlp.data.token_indexers import TokenIndexer, SingleIdTokenIndexer
from allennlp.data.tokenizers import Token
from allennlp.data.fields import TextField, MetadataField, ArrayField
from allennlp.data.vocabulary import Vocabulary
from allennlp.data.iterators import BucketIterator
from allennlp.common.checks import check_for_gpu
import torch
import torch.nn as nn
import torch.optim as optim
from allennlp.modules.seq2vec_encoders import Seq2VecEncoder, PytorchSeq2VecWrapper
from allennlp.nn.util import get_text_field_mask
from allennlp.models import Model, CrfTagger
from allennlp.modules.text_field_embedders import TextFieldEmbedder, BasicTextFieldEmbedder

logger = logging.getLogger(__name__)  # pylint: disable=invalid-name

if __name__ == "__main__":
    config_file = os.path.join(os.getcwd(), 'se/config.json')
    params = Params.from_file(config_file)
    r = Predictor.from_path("se/model.tar.gz")
    model = r._model
    extend_vocab = None

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # Additional Info when using cuda
    if device.type == 'cuda':
        print(torch.cuda.get_device_name(0))
        print('Memory Usage:')
        print('Allocated:', round(torch.cuda.memory_allocated(0) / 1024 ** 3, 1), 'GB')
        print('Cached:   ', round(torch.cuda.memory_cached(0) / 1024 ** 3, 1), 'GB')
        USE_GPU = True
    else:
        USE_GPU = False

    ser_dir = os.path.join(os.getcwd(), 'new_ser')

    #train_dir = os.path.join(os.getcwd(), 'eng.train/eng.train.txt')
    train_dir = '/home/lyriccoder/NER_project/eng.train/eng.train.txt'
    print("exists path {path} {result}?".format(path=train_dir, result=os.path.exists(train_dir)))
    train_data = r._dataset_reader.read(train_dir)

    all_datasets: Dict[str, Iterable[Instance]] = {"train": train_data}
    validation_and_test_dataset_reader: DatasetReader = r._dataset_reader
    val_dir = '/home/lyriccoder/NER_project/eng.testa/eng.testa.txt'
    print("exists path {path} {result}?".format(path=train_dir, result=os.path.exists(val_dir)))

    validation_data = validation_and_test_dataset_reader.read(val_dir)
    all_datasets["validation"] = validation_data

    test_dir = '/home/lyriccoder/NER_project/eng.testb/eng.testb.txt'
    print("exists path {path} {result}?".format(path=train_dir, result=os.path.exists(test_dir)))

    test_data = validation_and_test_dataset_reader.read(test_dir)
    all_datasets["test"] = test_data
    vocab = model.vocab
    if extend_vocab:
        print("Cannot extend vocabulary")

    iterator = BucketIterator(
        batch_size=64,
        biggest_batch_first=True,
        sorting_keys=[("tokens", "num_tokens")],
    )
    iterator.index_with(vocab)
    train_data = all_datasets['train']
    validation_data = all_datasets.get('validation')
    test_data = all_datasets.get('test')

    trainer = Trainer.from_params(
        model=model,
        serialization_dir=ser_dir,
        iterator=iterator,
        train_data=train_data,
        validation_data=validation_data,
        params=params,
        validation_iterator=None)

@schmmd Could you please help? Seems you had the same issue.

mahnerak commented 5 years ago

@lyriccoder It looks like you missed optimizer near trainer = Trainer.from_param(... Define optimizer and pass to the Trainer.from_param as it was demonstrated in the tutorial. (EDIT: consider only the next comments, I thought you're using init)

lyriccoder commented 5 years ago

But the config describes the optimizer, doesn't it? Otherwise why do we need the config in this case? The config is passed here:

params = Params.from_file(config_file) trainer = Trainer.from_params(...)

Moreover, if I have a pretrained model, how can I get it's optimizer? I see only _model object in the returned object, I cannot see optimizer. How can I get it form the pretrained model? Cloud you please tell me?

mahnerak commented 5 years ago

@lyriccoder Okay, so the way you load: params = Params.from_file(config_file) loads configuration of the run, however, Trainer.from_params expects only config of the Trainer module. Consider passing like Trainer.from_params(..., params=params.get('trainer'), ...)

lyriccoder commented 5 years ago

The problem is that there is not such the parameter(optimizer or trainer) for the _fromparams method: https://github.com/allenai/allennlp/blob/master/allennlp/training/trainer.py (659th line)

mahnerak commented 5 years ago

@lyriccoder You don't need optimizer or trainer parameter for the Trainer.from_params(). You need to pass params.get('trainer') instead of plain params to Trainer.from_params().

brendan-ai2 commented 5 years ago

@lyriccoder, is there a reason you need to define a custom script for training? We normally just use allennlp train. See https://github.com/allenai/allennlp/blob/master/tutorials/getting_started/walk_through_allennlp/training_and_evaluating.md.

lyriccoder commented 5 years ago

Yes, @brendan-ai2 there is a serious reason. I am writing an application which whill allow to train different models with different language. There will be a server (tornado or flask). That's why I need only python code. Thanks, @mahnerak I will try it. Also, I need python code for fine-tuning since users will "fine-tune" their models.

brendan-ai2 commented 5 years ago

Thanks for the help @mahnerak! @lyriccoder, I think you have the info you need, so I'm closing, but feel free to comment again if you run into trouble.

Also, feel free to copy code from our commands if that's helpful, e.g. for fine tuning. See https://github.com/allenai/allennlp/tree/master/allennlp/commands.