After every batch weights get initialized from beginning? Can't we load the weights of last trained batch?

A clear and concise description of what you want to know. I have been working on custom NER using flair. My training dataset is too large so I have used the concept of Pytorch Dataloader. I have divided the large training file into batches and further proceeded with the training. After every batch, I found that the weight file is updating from the beginning i.e. every new batch doesn't take into consideration the previously updated weights. Loss calculated after each batch also begin from the initial state. I am attaching a sample output with only 2 epochs. BATCH_1 2020-10-07 14:45:02,004 Model training base path: "resources/taggers/example-ner" 2020-10-07 14:45:02,005 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:02,006 Device: cpu 2020-10-07 14:45:02,007 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:02,008 Embeddings storage mode: cpu 2020-10-07 14:45:02,009 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:15,529 epoch 1 - iter 1/1 - loss 2.45444870 - samples/sec: 2.37 - lr: 0.100000 2020-10-07 14:45:15,530 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:15,531 EPOCH 1 done: loss 2.4544 - lr 0.1000000 2020-10-07 14:45:15,532 BAD EPOCHS (no improvement): 0 saving best model 2020-10-07 14:45:19,793 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:26,165 epoch 2 - iter 1/1 - loss 1.74781251 - samples/sec: 5.02 - lr: 0.100000 2020-10-07 14:45:26,166 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:26,167 EPOCH 2 done: loss 1.7478 - lr 0.1000000 2020-10-07 14:45:26,168 BAD EPOCHS (no improvement): 0 saving best model 2020-10-07 14:45:34,583 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:34,584 Testing using best model ... 2020-10-07 14:45:34,585 loading file resources/taggers/example-ner/best-model.pt BATCH_2 2020-10-07 14:45:58,747 Model training base path: "resources/taggers/example-ner" 2020-10-07 14:45:58,749 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:58,749 Device: cpu 2020-10-07 14:45:58,750 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:45:58,750 Embeddings storage mode: cpu 2020-10-07 14:45:58,751 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:46:13,919 epoch 1 - iter 1/1 - loss 2.47289896 - samples/sec: 2.11 - lr: 0.100000 2020-10-07 14:46:13,920 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:46:13,921 EPOCH 1 done: loss 2.4729 - lr 0.1000000 2020-10-07 14:46:13,924 BAD EPOCHS (no improvement): 0 saving best model 2020-10-07 14:46:20,379 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:46:27,231 epoch 2 - iter 1/1 - loss 1.50794005 - samples/sec: 4.67 - lr: 0.100000 2020-10-07 14:46:27,232 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:46:27,233 EPOCH 2 done: loss 1.5079 - lr 0.1000000 2020-10-07 14:46:27,234 BAD EPOCHS (no improvement): 0 saving best model 2020-10-07 14:46:36,570 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:46:36,571 Testing using best model ... 2020-10-07 14:46:36,574 loading file resources/taggers/example-ner/best-model.pt BATCH_3 2020-10-07 14:47:01,812 Model training base path: "resources/taggers/example-ner" 2020-10-07 14:47:01,813 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:01,814 Device: cpu 2020-10-07 14:47:01,815 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:01,815 Embeddings storage mode: cpu 2020-10-07 14:47:01,817 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:15,023 epoch 1 - iter 1/1 - loss 2.37841558 - samples/sec: 2.42 - lr: 0.100000 2020-10-07 14:47:15,024 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:15,025 EPOCH 1 done: loss 2.3784 - lr 0.1000000 2020-10-07 14:47:15,026 BAD EPOCHS (no improvement): 0 saving best model 2020-10-07 14:47:20,955 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:26,828 epoch 2 - iter 1/1 - loss 1.76086211 - samples/sec: 5.45 - lr: 0.100000 2020-10-07 14:47:26,829 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:26,830 EPOCH 2 done: loss 1.7609 - lr 0.1000000 2020-10-07 14:47:26,831 BAD EPOCHS (no improvement): 0 saving best model BATCH_4 2020-10-07 14:47:59,783 Model training base path: "resources/taggers/example-ner" 2020-10-07 14:47:59,784 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:59,785 Device: cpu 2020-10-07 14:47:59,786 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:47:59,787 Embeddings storage mode: cpu 2020-10-07 14:47:59,788 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:48:13,740 epoch 1 - iter 1/1 - loss 2.50737619 - samples/sec: 2.29 - lr: 0.100000 2020-10-07 14:48:13,743 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:48:13,744 EPOCH 1 done: loss 2.5074 - lr 0.1000000 2020-10-07 14:48:13,745 BAD EPOCHS (no improvement): 0 saving best model 2020-10-07 14:48:19,646 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:48:26,001 epoch 2 - iter 1/1 - loss 1.54825926 - samples/sec: 5.05 - lr: 0.100000 2020-10-07 14:48:26,002 ---------------------------------------------------------------------------------------------------- 2020-10-07 14:48:26,002 EPOCH 2 done: loss 1.5483 - lr 0.1000000 2020-10-07 14:48:26,003 BAD EPOCHS (no improvement): 0 saving best model and so on for rest BATCHES

IS THEIR ANY WAY TO LOAD THE LAST TRAINED WEIGHTS WITH EVERY NEW BATCH? @alanakbik

Here is the training function code:- def training_flair(total_batches):

for batch in range(total_batches):

    '''
    Here every new batch is basically the data files as --> train.txt, test.txt --> val.txt
    '''

    # modify_data_folder takes the batch and create a new train.txt, test.txt, val.txt in the data folder
    modify_data_folder(batch) 

    #Location of the data folder
    data_folder = 'data/'

    # define columns
    columns = {0 : 'text', 1 : 'ner'}

    # initializing the corpus
    corpus: Corpus = ColumnCorpus(data_folder, columns,
                                  train_file = 'train.txt',
                                  test_file = 'test.txt',
                                  dev_file = 'val.txt')

    # tag to predict
    tag_type = 'ner'
    # make tag dictionary from the corpus
    tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

    embedding_types : List[TokenEmbeddings] = [
        WordEmbeddings('glove'),
            ELMoEmbeddings('medium')
            ]
    embeddings : StackedEmbeddings = StackedEmbeddings(
                                     embeddings=embedding_types)

    tagger : SequenceTagger = SequenceTagger(hidden_size=256,
                                           embeddings=embeddings,
                                           tag_dictionary=tag_dictionary,
                                           tag_type=tag_type,
                                           use_crf=False)

    trainer : ModelTrainer = ModelTrainer(tagger, corpus)
    trainer.train('resources/taggers/example-ner',
                  learning_rate=0.1,
                  train_with_dev=True,
                  anneal_with_restarts=True,
                  shuffle=True,
                  num_workers=5,
                  mini_batch_size=32,
                  max_epochs=2,
                 checkpoint=True)

Yes, I totally agree with you @alanakbik that every time I am starting with a fresh training. The line starting with trainer: ModelTrainer basically calls the ModelTrainer class of the flair trainer python file (https://github.com/flairNLP/flair/blob/master/flair/trainers/trainer.py). I have check all the parameters being passed in the train function of the ModelTrainer class but couldn't find a parameter that gives the flexibility of resume weight training to every new batches. Parameters as per flair github are listed below:- (https://github.com/flairNLP/flair/blob/master/flair/trainers/trainer.py). def train( self, base_path: Union[Path, str], learning_rate: float = 0.1, mini_batch_size: int = 32, mini_batch_chunk_size: int = None, max_epochs: int = 100, scheduler = AnnealOnPlateau, cycle_momentum: bool = False, anneal_factor: float = 0.5, patience: int = 3, initial_extra_patience = 0, min_learning_rate: float = 0.0001, train_with_dev: bool = False, monitor_train: bool = False, monitor_test: bool = False, embeddings_storage_mode: str = "cpu", checkpoint: bool = False, save_final_model: bool = True, anneal_with_restarts: bool = False, anneal_with_prestarts: bool = False, batch_growth_annealing: bool = False, shuffle: bool = True, param_selection_mode: bool = False, write_weights: bool = False, num_workers: int = 6, sampler=None, use_amp: bool = False, amp_opt_level: str = "O1", eval_on_train_fraction=0.0, eval_on_train_shuffle=False, kwargs, )** -> dict: """ Trains any class that implements the flair.nn.Model interface. :param base_path: Main path to which all output during training is logged and models are saved :param learning_rate: Initial learning rate (or max, if scheduler is OneCycleLR) :param mini_batch_size: Size of mini-batches during training :param mini_batch_chunk_size: If mini-batches are larger than this number, they get broken down into chunks of this size for processing purposes :param max_epochs: Maximum number of epochs to train. Terminates training if this number is surpassed. :param scheduler: The learning rate scheduler to use :param cycle_momentum: If scheduler is OneCycleLR, whether the scheduler should cycle also the momentum :param anneal_factor: The factor by which the learning rate is annealed :param patience: Patience is the number of epochs with no improvement the Trainer waits until annealing the learning rate :param min_learning_rate: If the learning rate falls below this threshold, training terminates :param train_with_dev: If True, training is performed using both train+dev data :param monitor_train: If True, training data is evaluated at end of each epoch :param monitor_test: If True, test data is evaluated at end of each epoch :param embeddings_storage_mode: One of 'none' (all embeddings are deleted and freshly recomputed), 'cpu' (embeddings are stored on CPU) or 'gpu' (embeddings are stored on GPU) :param checkpoint: If True, a full checkpoint is saved at end of each epoch :param save_final_model: If True, final model is saved :param anneal_with_restarts: If True, the last best model is restored when annealing the learning rate :param shuffle: If True, data is shuffled during training :param param_selection_mode: If True, testing is performed against dev data. Use this mode when doing parameter selection. :param num_workers: Number of workers in your data loader. :param sampler: You can pass a data sampler here for special sampling of data. :param eval_on_train_fraction: the fraction of train data to do the evaluation on, if 0. the evaluation is not performed on fraction of training data, if 'dev' the size is determined from dev set size :param eval_on_train_shuffle: if True the train data fraction is determined on the start of training and kept fixed during training, otherwise it's sampled at beginning of each epoch :param kwargs: Other arguments for the Optimizer :return: """

Moreover, the training weights are being saved as final-model.pt which contains /final-model.pt/archive/data.pkl a pickle file. Every time this file is is not being called by the program as in the parameter passed in trainer.train is the base path as shown in the code above. Is there any way we could call this earlier saved pickle file and train it with a new batch and replace the pickle file and the same process continues for all batches? @alanakbik and the second ques Will it is possible to increase a parameter in the train function of the ModelTrainer class which will call the last trained weights and train it with new batch data? @alanakbik

flairNLP / flair

After every batch weights get initialized from beginning? Can't we load the weights of last trained batch? #1899