google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.15k stars 217 forks source link

Got AttributeError: 'GFile' object has no attribute 'readable' #68

Closed sparshbhawsar closed 4 years ago

sparshbhawsar commented 4 years ago

Hello,

While model data creation using task WTQ, I got AttributeError: 'GFile' object has no attribute 'readable'

COMMAND USED: ! python tapas/tapas/run_task_main.py \ --task="WTQ" \ --input_dir="{input_dir}" \ --output_dir="{output_dir}" \ --bert_vocab_file="/content/tapas_model/vocab.txt" \ --mode="create_data"

ERROR :


`WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Creating interactions ...
I0928 05:29:55.827632 140189694424960 run_task_main.py:152] Creating interactions ...
I0928 05:29:55.827936 140189694424960 wtq_utils.py:152] Converting data from: training.tsv...
Traceback (most recent call last):
  File "tapas/tapas/run_task_main.py", line 782, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "tapas/tapas/run_task_main.py", line 736, in main
    task_utils.create_interactions(task, FLAGS.input_dir, output_dir)
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/task_utils.py", line 110, in create_interactions
    wtq_utils.convert(input_dir, output_dir)
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/wtq_utils.py", line 241, in convert
    _convert_data(table_cache, input_dir, output_dir, train_file, version)
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/wtq_utils.py", line 168, in _convert_data
    table = _read_wtq_table(input_dir, wtq_table_id)
  File "/usr/local/lib/python3.6/dist-packages/tapas/utils/wtq_utils.py", line 96, in _read_wtq_table
    dtype='str',
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1880, in __init__
    src = TextIOWrapper(src, encoding=encoding, newline="")
AttributeError: 'GFile' object has no attribute 'readable'`
ghost commented 4 years ago

Thanks for reporting this.

This seems to be related:

https://stackoverflow.com/questions/60979955/ai-platform-gsutil-permissions-error-attributeerror-gfile-object-has-no-at

ghost commented 4 years ago
def _read_wtq_table(input_dir: Text, wtq_table_id: Text) -> pd.DataFrame:
  """Reads table file as pandas frame."""
  table_path = os.path.join(input_dir, wtq_table_id)
  with tf.io.gfile.GFile(table_path, 'r') as table_in:
    return pd.read_csv(
        table_in,
        delimiter=',',
        escapechar='\\',
        encoding='utf-8',
        dtype='str',
    )

This is the problematic code.

ghost commented 4 years ago

I can reproduce the issue with this:

import tensorflow.compat.v1 as tf
import pandas as pd
from typing import Text
import os
import csv

tf.get_logger().setLevel('ERROR')

def _read_wtq_table(input_dir: Text, wtq_table_id: Text) -> pd.DataFrame:
  """Reads table file as pandas frame."""
  table_path = os.path.join(input_dir, wtq_table_id)
  with tf.io.gfile.GFile(table_path) as table_in:
    return pd.read_csv(
        table_in,
        delimiter=',',
        escapechar='\\',
        encoding='utf-8',
        dtype='str',
      )

with open("/tmp/test.csv", "w") as output_file:
  writer = csv.writer(output_file)
  writer.writerow(["a", "b", "c"])
  writer.writerow(["1", "2", "3"])

print(_read_wtq_table("/tmp", "test.csv"))
ghost commented 4 years ago

The problem is cause by pd.read_csv wrapping the GFile object with an TextIOWrapper.

A potential fix is this:

def _read_wtq_table(input_dir: Text, wtq_table_id: Text) -> pd.DataFrame:
  """Reads table file as pandas frame."""
  table_path = os.path.join(input_dir, wtq_table_id)
  with tf.io.gfile.GFile(table_path) as table_in:
    string_io = io.StringIO(table_in.read())
    return pd.read_csv(
        string_io,
        delimiter=',',
        escapechar='\\',
        encoding='utf-8',
        dtype='str',
      )
sparshbhawsar commented 4 years ago

I tried the potential fix it's not working, I got the same error.

ghost commented 4 years ago

That's odd it worked for me. Anyway better fix is to remove encoding='utf-8':

def _read_wtq_table(input_dir: Text, wtq_table_id: Text) -> pd.DataFrame:
  """Reads table file as pandas frame."""
  table_path = os.path.join(input_dir, wtq_table_id)
  with tf.io.gfile.GFile(table_path) as table_in:
    return pd.read_csv(
        table_in,
        delimiter=',',
        escapechar='\\',
        dtype='str',
      )

Can you try that?

ghost commented 4 years ago

(The code will still handle UTF-8 correctly since that's the default for GFile.)

ghost commented 4 years ago

Should be fixed with the latest push. Feel free to reopen if this is still not working for you.

sparshbhawsar commented 4 years ago

Hey @thomasmueller-google,

Thanks it worked !