Closed apurvak closed 3 years ago
Hi apurvak, The table.tsv contains: table_id and the embedding extracted by the dual encoder.
Example of dual encoder to use:
! gsutil cp "gs://tapas_models/2021_04_27/tapas_nq_hn_retriever_medium.zip" "tapas_retriever.zip" && unzip tapas_retriever.zip
You can run:
! python -m tapas.experiments.table_retriever_experiment \
--do_predict \
--eval_name="dual_encoder_tables" \
--minutes_to_sleep_before_predictions=0 \
--num_eval_steps=0 \
--model_dir="
The output will be written in "output_dir/predict_results_0.tsv"
In case you didn't generate the table.tfrecord yet, you can use functions similar to the following ones:
1- This function is to get the interactions: def get_table(document_title, table_data): """Extracts the interaction for an str table. Args: table_data: str table where the columns are separated by '|' and rows by ' \n' document_title: str title of the page containing the table or a table title it also could be empty str.""" table = [list(map(lambda s: s.strip(), row.split("|"))) for row in table_data.split("\n") if row.strip()] table_interaction = interaction_pb2.Table() table_interaction.document_title = document_title
docment_title table_interaction.table_id = document_title if not table: return table_interaction for header in table[0]: table_interaction.columns.add().text = header for line in table[1:]: row = table_interaction.rows.add() for cell in line: row.cells.add().text = cell return table_interaction
2- To write the interactions as tf_records. You can use: def write_tfrecord(filename, examples): """From interactions examples to tfrecord.""" with tf.io.TFRecordWriter(filename) as writer: for example in examples: writer.write(example.SerializeToString())
3- You need to extract the tf_examples. You can use: def extract_tables_tf_examples(path_to_input_tables): """Extracts tf_examples from interactions.tfrecord."""
of the models use 512.
config=tf_example_utils.RetrievalConversionConfig(
vocab_file="
On Sun, Nov 21, 2021 at 7:12 PM apurvak @.***> wrote:
I am trying to evaluate the NQ Table model for my own data. To that end, I ran tapas/retrieval/create_retrieval_data_main.py script to generate the tables.tfrecord. However, the main notebook uses the following code to get the TSV file (that looks like embeddings of table) for retrieval model to get nearest neighbours:
def get_nearest_neighbors(num_neighbors):
….
tables = eval_table_retriever_utils.read_tables("results/nq_retrieval/model/tables.tsv", make_tables_unique=False)
….
How do I generate the tables.tsv file for my data? - the file for prepared data looks like this:
![Screenshot from 2021-11-21 10-09-55](
https://user-images.githubusercontent.com/1175315/142773919-c1149ac2-1586-45bc-a03e-353ca2ccbdda.png )
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google-research/tapas/issues/148, or unsubscribe https://github.com/notifications/unsubscribe-auth/APARZOPT2O5Q24S3LATN22DUNEY75ANCNFSM5IPNIS3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Thank you for your help! I have followed all the instructions. However, I do get following exception. I am running on a GPU device. The file is generated but it has nothing in it. :
INFO:tensorflow:prediction_loop marked as finished
I1122 06:15:45.357364 140518916581184 error_handling.py:115] prediction_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1122 06:15:45.357551 140518916581184 error_handling.py:149] Reraising captured error
ERROR:tensorflow:Error getting predictions for checkpoint /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0: Traceback (most recent call last):
File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 304, in main
_predict_and_export_metrics(
File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 163, in _predict_and_export_metrics
write_predictions(result, output_predict_file)
File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 194, in write_predictions
for prediction in predictions:
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3132, in predict
rendezvous.raise_errors()
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
six.reraise(typ, value, traceback)
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3121, in predict
for result in super(TPUEstimator, self).predict(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 623, in predict
with tf.compat.v1.train.MonitoredSession(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1058, in __init__
super(MonitoredSession, self).__init__(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 761, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1267, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1272, in _create_session
return self._sess_creator.create_session()
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 914, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 673, in create_session
return self._get_session_manager().prepare_session(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 314, in prepare_session
sess, is_loaded_from_checkpoint = self._restore_checkpoint(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 233, in _restore_checkpoint
_restore_checkpoint_and_maybe_run_saved_model_initializers(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 71, in _restore_checkpoint_and_maybe_run_saved_model_initializers
saver.restore(sess, path)
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/saver.py", line 1396, in restore
raise ValueError("The passed save_path is not a valid checkpoint: " +
ValueError: The passed save_path is not a valid checkpoint: /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0
E1122 06:15:45.358888 140518916581184 table_retriever_experiment.py:321] Error getting predictions for checkpoint /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0: Traceback (most recent call last):
File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 304, in main
_predict_and_export_metrics(
File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 163, in _predict_and_export_metrics
write_predictions(result, output_predict_file)
File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 194, in write_predictions
for prediction in predictions:
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3132, in predict
rendezvous.raise_errors()
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
six.reraise(typ, value, traceback)
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3121, in predict
for result in super(TPUEstimator, self).predict(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 623, in predict
with tf.compat.v1.train.MonitoredSession(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1058, in __init__
super(MonitoredSession, self).__init__(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 761, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1267, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1272, in _create_session
return self._sess_creator.create_session()
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 914, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 673, in create_session
return self._get_session_manager().prepare_session(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 314, in prepare_session
sess, is_loaded_from_checkpoint = self._restore_checkpoint(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 233, in _restore_checkpoint
_restore_checkpoint_and_maybe_run_saved_model_initializers(
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 71, in _restore_checkpoint_and_maybe_run_saved_model_initializers
saver.restore(sess, path)
File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/saver.py", line 1396, in restore
raise ValueError("The passed save_path is not a valid checkpoint: " +
ValueError: The passed save_path is not a valid checkpoint: /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0
Hi, The code is looking for a step zero: model.ckpt-0 instead of model.ckpt:
You can change the name of the ckpoints by running:
import shutil
for suffix in ['.data-00000-of-00001', '.index', '.meta']:
shutil.copyfile(f'
Also can you add the following code to make sure that the code will read
the model with step 0.
with open('
Thanks, Syrine
On Mon, Nov 22, 2021 at 3:18 PM apurvak @.***> wrote:
Thank you for your help! I have followed all the instructions. However, I do get following exception. I am running on a GPU device. The file is generated but it has nothing in it. :
INFO:tensorflow:prediction_loop marked as finished I1122 06:15:45.357364 140518916581184 error_handling.py:115] prediction_loop marked as finished WARNING:tensorflow:Reraising captured error W1122 06:15:45.357551 140518916581184 error_handling.py:149] Reraising captured error ERROR:tensorflow:Error getting predictions for checkpoint /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0: Traceback (most recent call last): File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 304, in main _predict_and_export_metrics( File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 163, in _predict_and_export_metrics write_predictions(result, output_predict_file) File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 194, in write_predictions for prediction in predictions: File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3132, in predict rendezvous.raise_errors() File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors six.reraise(typ, value, traceback) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3121, in predict for result in super(TPUEstimator, self).predict( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 623, in predict with tf.compat.v1.train.MonitoredSession( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1058, in init super(MonitoredSession, self).init( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 761, in init self._sess = _RecoverableSession(self._coordinated_creator) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1267, in init _WrappedSession.init(self, self._create_session()) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1272, in _create_session return self._sess_creator.create_session() File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 914, in create_session self.tf_sess = self._session_creator.create_session() File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 673, in create_session return self._get_session_manager().prepare_session( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 314, in prepare_session sess, is_loaded_from_checkpoint = self._restore_checkpoint( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 233, in _restore_checkpoint _restore_checkpoint_and_maybe_run_saved_model_initializers( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 71, in _restore_checkpoint_and_maybe_run_saved_model_initializers saver.restore(sess, path) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/saver.py", line 1396, in restore raise ValueError("The passed save_path is not a valid checkpoint: " + ValueError: The passed save_path is not a valid checkpoint: /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0
E1122 06:15:45.358888 140518916581184 table_retriever_experiment.py:321] Error getting predictions for checkpoint /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0: Traceback (most recent call last): File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 304, in main _predict_and_export_metrics( File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 163, in _predict_and_export_metrics write_predictions(result, output_predict_file) File "/home/exx/git/tapas/notebooks/tapas/tapas/experiments/table_retriever_experiment.py", line 194, in write_predictions for prediction in predictions: File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3132, in predict rendezvous.raise_errors() File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors six.reraise(typ, value, traceback) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3121, in predict for result in super(TPUEstimator, self).predict( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 623, in predict with tf.compat.v1.train.MonitoredSession( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1058, in init super(MonitoredSession, self).init( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 761, in init self._sess = _RecoverableSession(self._coordinated_creator) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1267, in init _WrappedSession.init(self, self._create_session()) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1272, in _create_session return self._sess_creator.create_session() File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 914, in create_session self.tf_sess = self._session_creator.create_session() File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 673, in create_session return self._get_session_manager().prepare_session( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 314, in prepare_session sess, is_loaded_from_checkpoint = self._restore_checkpoint( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 233, in _restore_checkpoint _restore_checkpoint_and_maybe_run_saved_model_initializers( File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/session_manager.py", line 71, in _restore_checkpoint_and_maybe_run_saved_model_initializers saver.restore(sess, path) File "/home/exx/git/tapas/venv8/lib/python3.8/site-packages/tensorflow/python/training/saver.py", line 1396, in restore raise ValueError("The passed save_path is not a valid checkpoint: " + ValueError: The passed save_path is not a valid checkpoint: /home/exx/git/tapas/apurva/tapas_dual_encoder_proj_256_large/model.ckpt-0
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google-research/tapas/issues/148#issuecomment-975574972, or unsubscribe https://github.com/notifications/unsubscribe-auth/APARZOOQJWGWD4X4E2WPYHDUNJGMDANCNFSM5IPNIS3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi,
Really appreciate all the help so far!
I followed all the instructions and prepared the tf_examples
using following script and subset of provided data:
--input_interactions_dir="//content/data/interactions" \
--input_tables_dir="/content/drive/MyDrive/min_out/tables" \
--output_dir="data/tf_examples" \
--vocab_file="tapas_dual_encoder_proj_256_large/vocab.txt" \
--max_seq_length=512 \
--max_column_id=512 \
--max_row_id=512 \
--use_document_title
Please note that after running the inference code, I get the output file predict_result_0.tsv but also get an error. :
After running the inference, I am getting following exception:
File "/content/tapas/tapas/experiments/table_retriever_experiment.py", line 305, in main
output_dir=prediction_output_dir)
File "/content/tapas/tapas/experiments/table_retriever_experiment.py", line 171, in _predict_and_export_metrics
make_tables_unique=True)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 370, in eval_precision_at_k
retrieval_results_file_path)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 290, in process_predictions
similarities, neighbors = _retrieve(queries, index)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 235, in _retrieve
similarities, nns = index.neighbors(query_embeddings)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 61, in neighbors
-self._n_neighbors)[:, -self._n_neighbors:]
File "<__array_function__ internals>", line 6, in argpartition
File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 830, in argpartition
return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
return bound(*args, **kwds)
ValueError: kth(=-97) out of bounds (3)
E1122 20:56:51.560956 140263615989632 table_retriever_experiment.py:317] Error getting predictions for checkpoint tapas_dual_encoder_proj_256_large/model.ckpt-0: Traceback (most recent call last):
File "/content/tapas/tapas/experiments/table_retriever_experiment.py", line 305, in main
output_dir=prediction_output_dir)
File "/content/tapas/tapas/experiments/table_retriever_experiment.py", line 171, in _predict_and_export_metrics
make_tables_unique=True)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 370, in eval_precision_at_k
retrieval_results_file_path)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 290, in process_predictions
similarities, neighbors = _retrieve(queries, index)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 235, in _retrieve
similarities, nns = index.neighbors(query_embeddings)
File "/content/tapas/tapas/scripts/eval_table_retriever_utils.py", line 61, in neighbors
-self._n_neighbors)[:, -self._n_neighbors:]
File "<__array_function__ internals>", line 6, in argpartition
File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 830, in argpartition
return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
return bound(*args, **kwds)
ValueError: kth(=-97) out of bounds (3)
I have got all the help. I figured out all problems and thank you so much for your help. If anyone else run into issues, feel free to connect me.
I am trying to evaluate the NQ Table model for my own data. To that end, I ran
tapas/retrieval/create_retrieval_data_main.py
script to generate the tables.tfrecord. However, the main notebook uses the following code to get the TSV file (that looks like embeddings of table) for retrieval model to get nearest neighbours:How do I generate the tables.tsv file for my data? - the file for prepared data looks like this: