google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.13k stars 216 forks source link

About preparing pretraining data for tapas #151

Closed BlankCheng closed 2 years ago

BlankCheng commented 2 years ago

Hello, I'm working on preparing the tapas pretraining data(.tfrecord) on google dataflow using the scripts provided in README except the --extra_packages=dist/tapas-0.0.1.dev0.tar.gz. The error occurs when executing:

  File "tapas/create_pretrain_examples_main.py", line 112, in <module>
    app.run(main)
  File "/home/zjcheng00000801/env/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/zjcheng00000801/env/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "tapas/create_pretrain_examples_main.py", line 108, in main
    beam_runner.run(pipeline)
  File "/home/zjcheng00000801/tapas/tapas/utils/beam_runner.py", line 86, in run
    return run_type(pipeline, FLAGS.runner_type)
  File "/home/zjcheng00000801/tapas/tapas/utils/beam_runner.py", line 71, in run_type
    return runners.DataflowRunner().run(pipeline, options=options)
  File "/home/zjcheng00000801/env/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 129, in run
    result.wait_until_finish()
  File "/home/zjcheng00000801/env/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1682, in wait_until_finish
    self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 290, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 275, in loads
    return load(file, ignore, **kwds)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 270, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 462, in find_class
    return StockUnpickler.find_class(self, module, name)
ModuleNotFoundError: No module named 'tapas'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute
    op.start()
  File "apache_beam/runners/worker/operations.py", line 710, in apache_beam.runners.worker.operations.DoOperation.start
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.start
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.start
  File "apache_beam/runners/worker/operations.py", line 311, in apache_beam.runners.worker.operations.Operation.start
  File "apache_beam/runners/worker/operations.py", line 317, in apache_beam.runners.worker.operations.Operation.start
  File "apache_beam/runners/worker/operations.py", line 659, in apache_beam.runners.worker.operations.DoOperation.setup
  File "apache_beam/runners/worker/operations.py", line 660, in apache_beam.runners.worker.operations.DoOperation.setup
  File "apache_beam/runners/worker/operations.py", line 292, in apache_beam.runners.worker.operations.Operation.setup
  File "apache_beam/runners/worker/operations.py", line 306, in apache_beam.runners.worker.operations.Operation.setup
  File "apache_beam/runners/worker/operations.py", line 799, in apache_beam.runners.worker.operations.DoOperation._get_runtime_performance_hints
  File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 294, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 275, in loads
    return load(file, ignore, **kwds)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 270, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 462, in find_class
    return StockUnpickler.find_class(self, module, name)
ModuleNotFoundError: No module named 'tapas'

I assume it may be caused by missing --extra_packages=dist/tapas-0.0.1.dev0.tar.gz. Where can I find the dist/tapas-0.0.1.dev0.tar.gz since it seems not to be provided in the github repo?

Many thanks.

BlankCheng commented 2 years ago

The tapas-0.01.dev0.tar.gz distribution is derived after building tapas. I'll close the issue and thank you for your great work!