SapienzaNLP / xl-amr

XL-AMR is a sequence-to-graph cross-lingual AMR parser that exploits transfer learning (EMNLP2020).
Other
16 stars 5 forks source link

During prediction batch seems to miss 'src_token_ids' key #1

Open LeHarter opened 2 years ago

LeHarter commented 2 years ago

I've tried to parse our test set of Spanish AMR with your xl-amr cross-lingual parser with this command:

python -u -m xlamr_stog.commands.predict --archive-file C:/Users/user/Documents/AMR/SpanishAMR/xl-amr/models/xl-amr_bilingual_en_es_trans_amr --weights-file C:/Users/user/Documents/AMR/SpanishAMR/xl-amr/models/xl-amr_bilingual_en_es_trans_amr/best.th --input-file C:/Users/user/Documents/AMR/SpanishAMR/Training_t5wtense/test_es.txt.features.input_clean.recategorize --batch-size 32 --use-dataset-reader --output-file C:/Users/user/Documents/AMR/SpanishAMR/Training_t5wtense/test_output.txt --silent --beam-size 5 --predictor STOG

Unfortunately, I've got this Error: Original exception was: Traceback (most recent call last): File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\commands\predict.py", line 275, in _predict(args) File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\commands\predict.py", line 227, in _predict manager.run() File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\commands\predict.py", line 200, in run for model_input_instance, result in zip(batch, self._predict_instances(batch)): File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\commands\predict.py", line 158, in _predict_instances results, encoder_last_state_seq = self._predictor.predict_batch_instance(batch_data) File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\predictors\stog.py", line 39, in predict_batch_instance _outputs, encoder_last_state_seq = super(STOGPredictor, self).predict_batch_instance(instances) File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\predictors\predictor.py", line 62, in predict_batch_instance outputs, encoder_last_state_seq = self._model.forward_on_instances(instances) File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\models\model.py", line 149, in forward_on_instances encoder_outputs = self(model_input) File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\models\stog.py", line 350, in forward encoder_outputs = self.encode( File "C:\Users\user\Documents\AMR\SpanishAMR\xl-amr\xlamr_stog\models\stog.py", line 437, in encode bert_mask = bert_tokens.ne(0) AttributeError: 'NoneType' object has no attribute 'ne'

It seems, that the bert_tokens are of type None, because the Batch tensor dicitonary is missing the "src_token_ids" key: def prepare_batch_input(self, batch):

[batch, num_tokens]

    **bert_token_inputs = batch.get('src_token_ids', None)**
    if bert_token_inputs is not None:
        bert_token_inputs = bert_token_inputs.long()
    encoder_token_subword_index = batch.get('src_token_subword_index', None)
    if encoder_token_subword_index is not None:
        encoder_token_subword_index = encoder_token_subword_index.long()
    encoder_token_inputs = batch['src_tokens']['encoder_tokens']
    encoder_pos_tags = batch['src_pos_tags']
    encoder_must_copy_tags = batch['src_must_copy_tags']
    # [batch, num_tokens, num_chars]
    encoder_char_inputs = batch['src_tokens']['encoder_characters']
    # [batch, num_tokens]
    encoder_mask = get_text_field_mask(batch['src_tokens'])

How could I fix that?

GerlinGreen commented 2 years ago

Hello, we've met similar issue while using cktp-amr-2.0 archive model. It was because the bert vocaburary file is missing, and we solved this problem by correcting parameters of the config file in our archive-file path. Maybe you can check if the bert model paths in config.json under your archive-file path are correct. Espically the word_spliter object which directs to the bert vocab file.