Closed sophgit closed 4 years ago
Thanks for your interest in TAPAS!
Can you provide some more details? In particular, the exact example your trying to process (question + table)?
Thank you for your quick response. The questions asked were:
result2=predict(holiday_list_of_list, ["Which people are there?","What is the start date of Brittas Südfrankreich Urlaub?","End date of Brittas Südfrankreich Urlaub?","What is the total Duration of Britta Glatts Holidaystyle Urlaub?"])
This is what the table looks like, it contains 36 rows:
The predictions worked perfectly, when I dropped the last column "TESTCATEGORY". But when I leave it in the dataframe, I get the error mentioned above.
Thanks for the quick response @sophgit . In order to facilitate debugging, do you mind sharing the table in a computer friendly format, for example a list of lists? Even better, if you can share a colab that reproduces the error that would be great, which you can do with Google Drive or saving to a github gist from the Save menu.
Can you open this? @eisenjulian https://colab.research.google.com/drive/1oH8-CuLju5fSwlk24NfvqI1FAWfAIg49?usp=sharing
Yes, we can open it.
I think the problem is that the current CLI call:
! python -m tapas.run_task_main \
--task="WTQ" \
--output_dir="results" \
--noloop_predict \
--test_batch_size={len(queries)} \
--tapas_verbosity="ERROR" \
--compression_type= \
--reset_position_index_per_cell \
--init_checkpoint="tapas_model/model.ckpt" \
--bert_config_file="tapas_model/bert_config.json" \
--mode="predict" 2> error \
--prune_columns
Does only run the predictions but assumes that all TF examples have been created.
The prune_columns
flag doesn't affect prediction but only the CREATE_DATA mode.
The actually conversion that should be affected happens in the convert_interactions_to_examples function.
To add pruning to the colab you will have to create a token selector:
from tapas.utils import pruning_utils
token_selector = pruning_utils.HeuristicExactMatchTokenSelector(
vocab_file,
max_seq_length,
pruning_utils.SelectionType.COLUMN,
use_previous_answer=True,
use_previous_questions=True,
)
and then you can call it just before calling the converter:
interaction = token_selector.annotated_interaction(interaction)
number_annotation_utils.add_numeric_values(interaction)
for i in range(len(interaction.questions)):
try:
yield converter.convert(interaction, i)
except ValueError as e:
print(f"Can't convert interaction: {interaction.id} error: {e}")
When I tried this I realized there was some problem with beam not being properly installed. I had to workaround it like this:
import apache_beam as beam
def fake_counter(namespace, message):
class FakeCounter():
def inc(increment=None, other=None):
pass
return FakeCounter()
class FakeMetrics:
def __init__(self):
self.counter = fake_counter
class FakeMetricsModule:
def __init__(self):
self.Metrics = FakeMetrics()
beam.metrics = FakeMetricsModule()
Looks like the apache_beam thing can also be fixed by restarting the runtime. See #89 for details.
Thank you so much!!! It seems to work. At least I don't get an error anymore and it does predict. Unfortunately the answers to the questions above are mainly incorrect now, but I'll see if I can work with that. :)
Great that it's working for you now.
I am closing this issue, feel free to open a new issue for any model quality problems and we can see if there is something we can do about it.
Hello,
I am new to this topic and I'm currently trying to use the pruning/filtering method for long tables in the WTQ notebook. I tried using the flag --prune_columns in the prediction function, but it still gives me "Can't convert interaction: error: Sequence too long". What are the necessary steps to filter/prune long tables during prediction?
Thank you in advance.