allenai / sequential_sentence_classification

https://arxiv.org/pdf/1909.04054
Apache License 2.0
77 stars 27 forks source link

Allennlp2 #13

Open UrszulaCzerwinska opened 3 years ago

UrszulaCzerwinska commented 3 years ago

What?

Updated code to allennlp 2.0 Changed "[SEP]" mention to a param #8 and updated the dataset reader to deal with special tokens : solved TODO in your code Changed the way to deal with sequence sentences too long: before the last sentence(s) was(were) cut out and the number of labels and sentences were not matching, in this version all "sub-sentences" of sequence sentencre are shortened till the total < 512: solved TODO in your code Modified code to deal with only one sentence in sequential sentences (case found in my data) Added dealing with RoBERTa special tokens (selectin the first one); side-effect, the batch must be set to 1

Why?

To be able to run your code with RoBERTa model or other models from hugging face. To apply models with different special tokens, some slight adaptation may be necessary. Code was tested with roberta-base and camembert-base, for bert with scibert-uncased and bert-uncased.

How?

Modified config, datareader and model files to work with allennlp 2.0 and roberta model

Anything Else?

I didn't manage to reproduce your results with this version (using the same model). It seems that in general the model needs to be trained longer (20 epochs) and still the results are not as good as with the older version. It is quite difficult to define why as many elements changed, such as attention layer, the tokenization, optimizer. Maybe you can have a look at the params and see where the problems come from. In the last commit the params are set for the best model I manage to obtain.

I will test it on my data soon, I can share the performances with you.

armancohan commented 3 years ago

Thanks for the PR!

You mentioned:

Still the results are not as good as with the older version

I wonder how much worse are the new results? I would be happy to merge this if the results are close.

UrszulaCzerwinska commented 3 years ago

I got these best results with roberta sep

I will report sci-bert results if it is of interest

Results
{
  "best_epoch": 7,
  "peak_worker_0_memory_MB": 4371.4375,
  "peak_gpu_0_memory_MB": 2771.052734375,
  "training_duration": "0:20:09.454623",
  "training_start_epoch": 0,
  "training_epochs": 11,
  "epoch": 11,
  "training_acc": 0.7737580517074032,
  "training_background_labelF": 0.8525213599205017,
  "training_method_labelF": 0.7604814767837524,
  "training_result_labelF": 0.7757422924041748,
  "training_objective_labelF": 0.5412684082984924,
  "training_other_labelF": 0.8407460451126099,
  "training_avgF": 0.7541519165039062,
  "training_loss": 0.3214186680863418,
  "training_worker_0_memory_MB": 4371.4375,
  "training_gpu_0_memory_MB": 2771.052734375,
  "validation_acc": 0.6885488647581441,
  "validation_background_labelF": 0.7977288961410522,
  "validation_method_labelF": 0.6619718670845032,
  "validation_result_labelF": 0.6815365552902222,
  "validation_objective_labelF": 0.41647595167160034,
  "validation_other_labelF": 0.7272727489471436,
  "validation_avgF": 0.6569972038269043,
  "validation_loss": 0.46063055468060204,
  "best_validation_acc": 0.6920039486673247,
  "best_validation_background_labelF": 0.789513111114502,
  "best_validation_method_labelF": 0.6646884083747864,
  "best_validation_result_labelF": 0.6976743936538696,
  "best_validation_objective_labelF": 0.41253262758255005,
  "best_validation_other_labelF": 0.7619047164916992,
  "best_validation_avgF": 0.6652626514434814,
  "best_validation_loss": 0.4458129028395071,
  "test_acc": 0.815418828762046,
  "test_background_labelF": 0.885624349117279,
  "test_method_labelF": 0.7943925261497498,
  "test_result_labelF": 0.7943549156188965,
  "test_objective_labelF": 0.64000004529953,
  "test_other_labelF": 0.8983051180839539,
  "test_avgF": 0.8025353908538818,
  "test_loss": 0.4203042404281091
}

Params

{
    "dataset_reader": {
        "type": "SeqClassificationReader",
        "max_sent_per_example": 10,
        "sci_sum": false,
        "sci_sum_fake_scores": false,
        "sent_max_len": 80,
        "token_indexers": {
            "bert": {
                "type": "pretrained_transformer",
                "model_name": "roberta-base",
                "tokenizer_kwargs": {
                    "truncation_strategy": "do_not_truncate"
                }
            }
        },
        "tokenizer": {
            "type": "pretrained_transformer",
            "model_name": "roberta-base",
            "tokenizer_kwargs": {
                "truncation_strategy": "do_not_truncate"
            }
        },
        "use_abstract_scores": false,
        "use_sep": true
    },
    "model": {
        "type": "SeqClassificationModel",
        "additional_feature_size": 0,
        "bert_dropout": 0.1,
        "intersentence_token": "</s>",
        "model_type": "roberta",
        "sci_sum": false,
        "self_attn": {
            "type": "pytorch_transformer",
            "feedforward_hidden_dim": 100,
            "input_dim": 768,
            "num_attention_heads": 2,
            "num_layers": 3
        },
        "text_field_embedder": {
            "token_embedders": {
                "bert": {
                    "type": "pretrained_transformer",
                    "last_layer_only": false,
                    "model_name": "roberta-base",
                    "train_parameters": true
                }
            }
        },
        "use_sep": true,
        "with_crf": false
    },
    "train_data_path": "data/CSAbstruct/train.jsonl",
    "validation_data_path": "data/CSAbstruct/dev.jsonl",
    "test_data_path": "data/CSAbstruct/test.jsonl",
    "trainer": {
        "cuda_device": 0,
        "grad_clipping": 1,
        "learning_rate_scheduler": {
            "type": "slanted_triangular",
            "cut_frac": 0.1,
            "num_epochs": 30,
            "num_steps_per_epoch": 56
        },
        "num_epochs": 30,
        "num_gradient_accumulation_steps": 32,
        "optimizer": {
            "type": "huggingface_adamw",
            "lr": 1e-05,
            "weight_decay": 0.01
        },
        "patience": 5,
        "validation_metric": "+acc"
    },
    "data_loader": {
        "batch_size": 1,
        "shuffle": true
    },
    "evaluate_on_test": true,
    "numpy_seed": 1527,
    "pytorch_seed": 1527,
    "random_seed": 15270
}
UrszulaCzerwinska commented 3 years ago

@armancohan For scibert-uncased SEP:

RESULTS

{
  "best_epoch": 15,
  "peak_worker_0_memory_MB": 4070.04296875,
  "peak_gpu_0_memory_MB": 2549.546875,
  "training_duration": "0:37:45.390380",
  "training_start_epoch": 0,
  "training_epochs": 24,
  "epoch": 24,
  "training_acc": 0.9114973969822642,
  "training_background_labelF": 0.9437795281410217,
  "training_method_labelF": 0.91558438539505,
  "training_result_labelF": 0.908378005027771,
  "training_objective_labelF": 0.8221437335014343,
  "training_other_labelF": 0.8834532499313354,
  "training_avgF": 0.8946677803993225,
  "training_loss": 0.14108116390579037,
  "training_worker_0_memory_MB": 4070.04296875,
  "training_gpu_0_memory_MB": 2549.546875,
  "validation_acc": 0.6599210266535045,
  "validation_background_labelF": 0.7566338181495667,
  "validation_method_labelF": 0.6464647054672241,
  "validation_result_labelF": 0.6568986177444458,
  "validation_objective_labelF": 0.43478262424468994,
  "validation_other_labelF": 0.71074378490448,
  "validation_avgF": 0.6411047101020813,
  "validation_loss": 0.6300637987994794,
  "best_validation_acc": 0.6796643632773939,
  "best_validation_background_labelF": 0.7804175615310669,
  "best_validation_method_labelF": 0.6477271914482117,
  "best_validation_result_labelF": 0.6746410727500916,
  "best_validation_objective_labelF": 0.4788135588169098,
  "best_validation_other_labelF": 0.6666666865348816,
  "best_validation_avgF": 0.6496532142162323,
  "best_validation_loss": 0.5098793767645198,
  "test_acc": 0.8265381764269829,
  "test_background_labelF": 0.9030928015708923,
  "test_method_labelF": 0.7946860194206238,
  "test_result_labelF": 0.8207343220710754,
  "test_objective_labelF": 0.6518988013267517,
  "test_other_labelF": 0.9090909361839294,
  "test_avgF": 0.8159005761146545,
  "test_loss": 0.41943900255008343
}

PARAMS

{
    "dataset_reader": {
        "type": "SeqClassificationReader",
        "max_sent_per_example": 10,
        "sci_sum": false,
        "sci_sum_fake_scores": false,
        "sent_max_len": 100,
        "token_indexers": {
            "bert": {
                "type": "pretrained_transformer",
                "model_name": "allenai/scibert_scivocab_uncased",
                "tokenizer_kwargs": {
                    "truncation_strategy": "do_not_truncate"
                }
            }
        },
        "tokenizer": {
            "type": "pretrained_transformer",
            "model_name": "allenai/scibert_scivocab_uncased",
            "tokenizer_kwargs": {
                "truncation_strategy": "do_not_truncate"
            }
        },
        "use_abstract_scores": false,
        "use_sep": true
    },
    "model": {
        "type": "SeqClassificationModel",
        "additional_feature_size": 0,
        "bert_dropout": 0.1,
        "intersentence_token": "[SEP]",
        "model_type": "bert",
        "sci_sum": false,
        "self_attn": {
            "type": "pytorch_transformer",
            "feedforward_hidden_dim": 100,
            "input_dim": 630,
            "num_attention_heads": 3,
            "num_layers": 3
        },
        "text_field_embedder": {
            "token_embedders": {
                "bert": {
                    "type": "pretrained_transformer",
                    "last_layer_only": false,
                    "model_name": "allenai/scibert_scivocab_uncased",
                    "train_parameters": true
                }
            }
        },
        "use_sep": true,
        "with_crf": false
    },
    "train_data_path": "data/CSAbstruct/train.jsonl",
    "validation_data_path": "data/CSAbstruct/dev.jsonl",
    "test_data_path": "data/CSAbstruct/test.jsonl",
    "trainer": {
        "cuda_device": 0,
        "grad_clipping": 1,
        "learning_rate_scheduler": {
            "type": "slanted_triangular",
            "cut_frac": 0.1,
            "num_epochs": 100,
            "num_steps_per_epoch": 36
        },
        "num_epochs": 100,
        "num_gradient_accumulation_steps": 50,
        "optimizer": {
            "type": "huggingface_adamw",
            "lr": 1e-05,
            "weight_decay": 0.001
        },
        "patience": 10,
        "validation_metric": "+avgF"
    },
    "data_loader": {
        "batch_size": 1,
        "shuffle": true
    },
    "evaluate_on_test": true,
    "numpy_seed": 1530,
    "pytorch_seed": 1530,
    "random_seed": 15300
}

I guess with more fine tuning this results can get slightly.