google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.13k stars 217 forks source link

NameError when executing Dataflow pipeline to get NQ tables dataset #121

Closed bogdankostic closed 3 years ago

bogdankostic commented 3 years ago

As stated here, I tried to generate the NQ table dataset by executing the following command:

python3 tapas/scripts/preprocess_nq.py \
  --input_path="gs://natural_questions/v1.0" \
  --output_path="gs://${GCP_BUCKET}/nq_tables" \
  --runner_type="DATAFLOW" \
  --gc_project="${GCP_PROJECT}" \
  --gc_region="us-west1" \
  --gc_job_name="create-intermediate" \
  --gc_staging_location="gs://${GCP_BUCKET}/staging" \
  --gc_temp_location="gs://${GCP_BUCKET}/tmp" \
  --extra_packages=dist/tapas-table-parsing-0.0.1.dev0.tar.gz

After some time, this results in the following error:

NameError: name 'beam' is not defined [while running 'Parse']

This is easily solved by adding the option --save_main_session, so I would suggest changing this command to:

python3 tapas/scripts/preprocess_nq.py \
  --input_path="gs://natural_questions/v1.0" \
  --output_path="gs://${GCP_BUCKET}/nq_tables" \
  --runner_type="DATAFLOW" \
  --gc_project="${GCP_PROJECT}" \
  --gc_region="us-west1" \
  --gc_job_name="create-intermediate" \
  --gc_staging_location="gs://${GCP_BUCKET}/staging" \
  --gc_temp_location="gs://${GCP_BUCKET}/tmp" \
  --extra_packages=dist/tapas-table-parsing-0.0.1.dev0.tar.gz \
  --save_main_session
eisenjulian commented 3 years ago

Great point, we added the save_main_session precisely for this reason but we forgot to added the to the doc. Will add it for the next release

eisenjulian commented 3 years ago

This has been updated, thanks a lot!