google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.15k stars 217 forks source link

TAPAS with Pandas df? #64

Closed aminfardi closed 4 years ago

aminfardi commented 4 years ago

I'm trying to follow the SQA colab workflow, but noticed the table format of predict function requires "|" delimiter. Has anyone successfully used predict with a pandas dataframe?

kamalkraj commented 4 years ago

Hi @aminfardi, Try https://github.com/kamalkraj/TAPAS-TF2 I have provided one converted model in the README. you can use the converter for weights conversion and use any model provided in this repo README.

Akshaysharma29 commented 4 years ago

In colab notebook, they convert table into list of lists so by converting pandas dataframe into list of lists(including headers as 1st list) you can use it. I hope you get the answer.

ghost commented 4 years ago

Thanks for question and answers!

Just for completeness a simple conversion like this should work:


def df_to_table_proto(frame):
  table = interaction_pb2.Table()

  for column in frame:
    table.columns.add().text = column

  for index, row in frame.iterrows():
    table.rows.add()
    for cell in row:
      table.rows[-1].cells.add().text = cell

  return table

You then need to add you table to an interaction and set the question (as in convert_interactions_to_examples):


# Build interaction
interaction = interaction_pb2.Interaction()
interaction.table.CopyFrom(table)

# Add question and maybe also answers ...
new_question = interaction.questions.add()
new_question.original_text = "..."

# Set some ids for book keeping ...
interaction.id = ...
interaction.table.table_id = ...
new_question.id = ...

# Call number parser for numeric embeddings.
number_annotation_utils.add_numeric_values(interaction)
aminfardi commented 4 years ago

Thank you everybody for the detailed responses!