google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.15k stars 217 forks source link

Pruning method in WTQ #88

Closed sophgit closed 4 years ago

sophgit commented 4 years ago

Hello,

I am new to this topic and I'm currently trying to use the pruning/filtering method for long tables in the WTQ notebook. I tried using the flag --prune_columns in the prediction function, but it still gives me "Can't convert interaction: error: Sequence too long". What are the necessary steps to filter/prune long tables during prediction?

Thank you in advance.

ghost commented 4 years ago

Thanks for your interest in TAPAS!

Can you provide some more details? In particular, the exact example your trying to process (question + table)?

sophgit commented 4 years ago

Thank you for your quick response. The questions asked were:

result2=predict(holiday_list_of_list, ["Which people are there?","What is the start date of Brittas Südfrankreich Urlaub?","End date of Brittas Südfrankreich Urlaub?","What is the total Duration of Britta Glatts Holidaystyle Urlaub?"])

This is what the table looks like, it contains 36 rows:

image

The predictions worked perfectly, when I dropped the last column "TESTCATEGORY". But when I leave it in the dataframe, I get the error mentioned above.

eisenjulian commented 4 years ago

Thanks for the quick response @sophgit . In order to facilitate debugging, do you mind sharing the table in a computer friendly format, for example a list of lists? Even better, if you can share a colab that reproduces the error that would be great, which you can do with Google Drive or saving to a github gist from the Save menu.

sophgit commented 4 years ago

Can you open this? @eisenjulian https://colab.research.google.com/drive/1oH8-CuLju5fSwlk24NfvqI1FAWfAIg49?usp=sharing

ghost commented 4 years ago

Yes, we can open it.

I think the problem is that the current CLI call:

  ! python -m tapas.run_task_main \
    --task="WTQ" \
    --output_dir="results" \
    --noloop_predict \
    --test_batch_size={len(queries)} \
    --tapas_verbosity="ERROR" \
    --compression_type= \
    --reset_position_index_per_cell \
    --init_checkpoint="tapas_model/model.ckpt" \
    --bert_config_file="tapas_model/bert_config.json" \
    --mode="predict" 2> error \
    --prune_columns

Does only run the predictions but assumes that all TF examples have been created. The prune_columns flag doesn't affect prediction but only the CREATE_DATA mode.

The actually conversion that should be affected happens in the convert_interactions_to_examples function.

ghost commented 4 years ago

To add pruning to the colab you will have to create a token selector:

from tapas.utils import pruning_utils

token_selector = pruning_utils.HeuristicExactMatchTokenSelector(
      vocab_file,
      max_seq_length,
      pruning_utils.SelectionType.COLUMN,
      use_previous_answer=True,
      use_previous_questions=True,
)

and then you can call it just before calling the converter:

    interaction = token_selector.annotated_interaction(interaction)
    number_annotation_utils.add_numeric_values(interaction)
    for i in range(len(interaction.questions)):
      try:
        yield converter.convert(interaction, i)
      except ValueError as e:
        print(f"Can't convert interaction: {interaction.id} error: {e}")

When I tried this I realized there was some problem with beam not being properly installed. I had to workaround it like this:

import apache_beam as beam

def fake_counter(namespace, message):
  class FakeCounter():
    def inc(increment=None, other=None):
      pass
  return FakeCounter() 

class FakeMetrics:
  def __init__(self):
    self.counter = fake_counter

class FakeMetricsModule:
  def __init__(self):
    self.Metrics = FakeMetrics()

beam.metrics = FakeMetricsModule()
ghost commented 4 years ago

Looks like the apache_beam thing can also be fixed by restarting the runtime. See #89 for details.

sophgit commented 4 years ago

Thank you so much!!! It seems to work. At least I don't get an error anymore and it does predict. Unfortunately the answers to the questions above are mainly incorrect now, but I'll see if I can work with that. :)

ghost commented 4 years ago

Great that it's working for you now.

I am closing this issue, feel free to open a new issue for any model quality problems and we can see if there is something we can do about it.