[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong?

Gs-Zhang commented 3 years ago

I0922 11:46:38.663871 140308634334976 run_squad_v2.py:505] Final Eval results INFO:tensorflow: exact = 50.09685841825992 I0922 11:46:38.663987 140308634334976 run_squad_v2.py:507] exact = 50.09685841825992 INFO:tensorflow: f1 = 50.11359538016659 I0922 11:46:38.664040 140308634334976 run_squad_v2.py:507] f1 = 50.11359538016659 INFO:tensorflow: null_score_diff_threshold = -1.230899453163147 I0922 11:46:38.664077 140308634334976 run_squad_v2.py:507] null_score_diff_threshold = -1.230899453163147 INFO:tensorflow: total = 11873 I0922 11:46:38.664113 140308634334976 run_squad_v2.py:507] total = 11873

Gs-Zhang commented 3 years ago

flags.DEFINE_string( "albert_config_file", 'albert_base/albert_config.json', "The config json file corresponding to the pre-trained ALBERT model. " "This specifies the model architecture.")

flags.DEFINE_string("vocab_file", 'albert_xlarge/30k-clean.vocab', "The vocabulary file that the ALBERT model was trained on.")

flags.DEFINE_string("spm_model_file", 'albert_xlarge/30k-clean.model', "The model file for sentence piece tokenization.")

flags.DEFINE_string( "output_dir", 'result', "The output directory where the model checkpoints will be written.")

Other parameters

flags.DEFINE_string("train_file", '/home/gszhang/code/NLP/albert/train-v2.0.json', "SQuAD json for training. E.g., train-v1.1.json")

flags.DEFINE_string( "predict_file", 'dev-v2.0.json', "SQuAD json for predictions. E.g., dev-v1.1.json or test-v1.1.json")

flags.DEFINE_string("train_feature_file", '/home/gszhang/code/NLP/albert/result/train_feature', "training feature file.")

flags.DEFINE_string( "predict_feature_file", '/home/gszhang/code/NLP/albert/result/predict_feature', "Location of predict features. If it doesn't exist, it will be written. " "If it does exist, it will be read.")

flags.DEFINE_string( "predict_feature_left_file", '/home/gszhang/code/NLP/albert/result/predict_left_feature', "Location of predict features not passed to TPU. If it doesn't exist, it " "will be written. If it does exist, it will be read.")

flags.DEFINE_string( "init_checkpoint", 'albert_xlarge/model.ckpt-best.index', "Initial checkpoint (usually from a pre-trained BERT model).")

flags.DEFINE_string( "albert_hub_module_handle", None, "If set, the ALBERT hub module to use.")

flags.DEFINE_bool( "do_lower_case", True, "Whether to lower case the input text. Should be True for uncased " "models and False for cased models.")

flags.DEFINE_integer( "max_seq_length", 384, "The maximum total input sequence length after WordPiece tokenization. " "Sequences longer than this will be truncated, and sequences shorter " "than this will be padded.")

flags.DEFINE_integer( "doc_stride", 128, "When splitting up a long document into chunks, how much stride to " "take between chunks.")

flags.DEFINE_integer( "max_query_length", 64, "The maximum number of tokens for the question. Questions longer than " "this will be truncated to this length.")

flags.DEFINE_bool("do_train", False, "Whether to run training.")

flags.DEFINE_bool("do_predict", True, "Whether to run eval on the dev set.")

flags.DEFINE_integer("train_batch_size", 8, "Total batch size for training.")

flags.DEFINE_integer("predict_batch_size", 8, "Total batch size for predictions.")

flags.DEFINE_float("learning_rate", 5e-5, "The initial learning rate for Adam.")

flags.DEFINE_float("num_train_epochs", 3.0, "Total number of training epochs to perform.")

flags.DEFINE_float( "warmup_proportion", 0.1, "Proportion of training to perform linear learning rate warmup for. " "E.g., 0.1 = 10% of training.")

flags.DEFINE_integer("save_checkpoints_steps", 1000, "How often to save the model checkpoint.")

flags.DEFINE_integer("iterations_per_loop", 1000, "How many steps to make in each estimator call.")

flags.DEFINE_integer( "n_best_size", 20, "The total number of n-best predictions to generate in the " "nbest_predictions.json output file.")

flags.DEFINE_integer( "max_answer_length", 30, "The maximum length of an answer that can be generated. This is needed " "because the start and end predictions are not conditioned on one another.")

flags.DEFINE_bool("use_tpu", False, "Whether to use TPU or GPU/CPU.")

tf.flags.DEFINE_string( "tpu_name", None, "The Cloud TPU to use for training. This should be either the name " "used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 " "url.")

tf.flags.DEFINE_string( "tpu_zone", None, "[Optional] GCE zone where the Cloud TPU is located in. If not " "specified, we will attempt to automatically detect the GCE project from " "metadata.")

tf.flags.DEFINE_string( "gcp_project", None, "[Optional] Project name for the Cloud TPU-enabled project. If not " "specified, we will attempt to automatically detect the GCE project from " "metadata.")

tf.flags.DEFINE_string("master", None, "[Optional] TensorFlow master URL.")

flags.DEFINE_integer( "num_tpu_cores", 8, "Only used if use_tpu is True. Total number of TPU cores to use.")

flags.DEFINE_integer("start_n_top", 5, "beam size for the start positions.")

flags.DEFINE_integer("end_n_top", 5, "beam size for the end positions.")

flags.DEFINE_float("dropout_prob", 0.1, "dropout probability.")

this is what I set in run_squad_v2, I can't find the problem, Thanks for your help!

Gs-Zhang commented 3 years ago

And the feature file is not exist, it is generated when I am running the .py.

Huibin-Ge commented 3 years ago

hi, I meet the same problem, since i use the gpu to run the code, and i change TPUEstimator to Estimator and change TPUEstimatorSpec to EstimatorSpec, and the problem was solved, and can get the f1 score as paper

PremalMatalia commented 3 years ago

@Huibin-Ge - Is it possible to provide your notebook file or code which you are using. I am facing issues in running fine-tuning of albert base using SQuAD 2.0 and training doesn't start and stopped abruptly without any error. Must be some parameter is wrong.

marvel2120 commented 3 years ago

same problem

kavin525zhang commented 2 years ago

hi, I meet the same problem, the result is too low, can you tell me how to change TPUEstimator to Estimator and change TPUEstimatorSpec to EstimatorSpec?

huibinGe commented 2 years ago

Hi, I public my fixed code in https://github.com/huibinGe/albert_gpu_squad. TPUEstimator to Estimator mainly in run_squad_v2.py and TPUEstimatorSpec to EstimatorSpec mainly in squad_utils.py

google-research / albert

[Problem/ Squad V2] the result is too low compare with the F1 score in paper, Is something wrong? #230

Other parameters