clovaai / spade

Apache License 2.0
81 stars 20 forks source link

ValueError at `f_parse_head_id.index` #18

Closed ndgnuh closed 2 years ago

ndgnuh commented 2 years ago

Hello, I'm trying to train with custom data and got this error:

File "/home/phung/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/home/phung/.local/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/home/phung/.local/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model.py", line 382, in validation_step
    results = self._run("test", batch)
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model.py", line 281, in _run
    parses, f_parses, text_unit_field_labels, f_parse_box_ids = gen_parses(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 305, in gen_parses
    parses, grouped_col_ids = gen_fg_parses(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 544, in gen_fg_parses
    parses, remained_f_parse_head_ids = gen_grouped_parses(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 616, in gen_grouped_parses
    parse = gen_grouped_parse1(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 644, in gen_grouped_parse1
    f_parse1 = imp_get_f_parse_from_id_member(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 670, in imp_get_f_parse_from_id_member
    idx = f_parse_head_id.index(id_member)
ValueError: 16 is not in list

The config file:

verbose: true
raw_data_input_type: type1
data_paths:
  dev: vietnamese_invoice2/dev.jsonl
  op_dev: vietnamese_invoice2/devop.jsonl
  op_test: vietnamese_invoice2/devtest.jsonl
  test: vietnamese_invoice2/test.jsonl
  train: vietnamese_invoice2/train.jsonl
dist_norm: img_diagonal
infer_param:
  allow_small_edit_distance: true
  refine_parse: false
  unwanted_fields:
    -
method_for_token_xy_generation: equal_division
model_param:
  input_embedding_components:
    - base
  #    - seqPos
  # - absPos
  #    - charSize/n_
  #    - vertical
  bert_info_comb_type: base
  decoder_hidden_size: 100
  decoder_type: spade
  encoder_backbone_is_pretrained: true
  encoder_backbone_name: bert-base-multilingual-cased
  encoder_backbone_tweak_tag: org
  encoder_config_name: bert-base-multilingual-cased-5layers
  # encoder_backbone_name: vinai/phobert-base
  # encoder_backbone_tweak_tag:   
  # encoder_config_name: vinai/phobert-base
  encoder_type_name: spade
  encoder_layer_ids_used_in_decoder:
    - -1
  #  examples_of_inffering_method: force_single_tail_node, force_single_tail_node_but_allow_multiple_seeds,  no_constraint
  #  examples_of_parse_gen_method: multiple_beam, single_beam
  field_representers:
    - store.name
    - menu.name
    - subtotal.price
    - total.price
    - info.time
  fields:
    - store.name
    - store.address
    - menu.name
    - menu.id
    - menu.count
    - menu.unit
    - menu.unitprice
    - menu.price
    - menu.discount
    - subtotal.tax
    - subtotal.count
    - subtotal.discount
    - subtotal.price
    - total.price
    - total.cash
    - total.credit
    - total.change
    - info.transaction
    - info.time
    - info.staff

  gt_parse_gen_method: single_beam
  include_second_order_relations: false
  inferring_method:
    - force_single_tail_node_but_allow_multiple_seeds
    - no_constraint
  input_split_overlap_len: 0
  l_max_gen_of_each_parse: 10
  max_input_len: 32
  max_info_depth: 1
  model_name: RelationTagging
  n_angle_unit: 60
  n_char_unit: 5
  n_dist_unit: 120
  n_relation_type: 2
  no_rel_attention: false
  omit_angle_cal: false
  parse_gen_method: single_beam
  precision: 16
  pre_layer_norm: true
  task: receipt_v1
  task_lan: ind
  token_lv_boxing: false
  trainable_rel_emb: false
  use_cos_emb: false
  vi_params:
    do_gp:
      - true
      - true
    do_sb:
      - true
      - true
    n_vi_iter: 3
  weights:
    trained: false
    path: model/saved/spade.vi.train.yaml/best/model.pt
toy_data: false
toy_size: 10
train_param:
  accelerator: 
  accumulate_grad_batches: 1
  augment_coord: false
  augment_data: false
  batch_size: 1
  batch_size_for_test: 1
  coord_aug_params_keys: '[n_min, n_max, amp_min, amp_max, angle_min, angle_max],  [0, 2, -15, 15, -10, 10]'
  data_augmentation_refresh_interval: 10
  gradient_clip_val: 0
  gradient_clip_algorithm: value
  initial_coord_aug_params:
    - - 0
      - 4
      - 0
      - 35
    - - 0
      - 1.5
      - 0
      - 25
    - - -10
      - 10
  initial_token_aug_params:
    - 0.033
    - 0.033
    - 0
    - 0.033
    - 2
  cross_entropy_loss_weight:
    - 0.1
    - 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  lr_scheduler_type: warmup_constant
  lr_scheduler_param:
    warmup_constant:
      lr_default: 0.00007
      lr_enc: 0.00007
      lr_dec: 0.0007
      lr_max: 0.0007
      num_warmup_steps: 30
  max_epochs: 10000
  multi_gpu: false
  n_cpus: 12
  optimizer_type: adam
  save_epoch_interval: 25
  skip_long: true
  token_aug_params_keys: '[p_del, p_subs, p_insert, p_tail_insert, n_max_insert],  [0.033,
    0.033, 0, 0.033, 2]'
  unique_token_pool: false
  val_check_interval: 1.0
  validation_metric: f1

The data file we used is this one (all the dev, devop, train, test.. are the same, this file is only for test purpose): data.zip

It would be awesome if you could give some pointers about what is wrong with the data. Thank you!

ndgnuh commented 2 years ago

Problem: in some records, there are rel_s edges between text nodes, but there's no field type node link to them.