dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

[BUG] CPU Unittest (macos-python3.7) Failed for ArrowTypeError #1455

Closed barry-jin closed 3 years ago

barry-jin commented 3 years ago

Description

CPU unittest for macos with python3.7.9 will fail on pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column label with type int64'). Probably because numpy has been upgraded to 1.20.0rc1 in the most recent CI tests. Screen Shot 2020-12-07 at 1 49 12 PM After I fix numpy version to 1.19.4, unittest for macos with python3.7.9 will pass (link).

Error Message

_______________________________ test_glue[copa] ________________________________

task = 'copa'

    @pytest.mark.remote_required
    @pytest.mark.parametrize('task', ["cb", "copa", "multirc", "rte", "wic", "wsc", "boolq", "record",
                                      'broadcoverage-diagnostic', 'winogender-diagnostic'])
    def test_glue(task):
        parser = prepare_glue.get_parser()
        with tempfile.TemporaryDirectory() as root:
            args = parser.parse_args(['--benchmark', 'superglue',
                                      '--tasks', task,
                                      '--data_dir', root])
>           prepare_glue.main(args)

tests/data_cli/test_glue.py:28: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/gluonnlp/cli/data/general_nlp_benchmark/prepare_glue.py:689: in main
    df.to_parquet(os.path.join(base_dir, '{}.parquet'.format(key)))
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/util/_decorators.py:199: in wrapper
    return func(*args, **kwargs)
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/core/frame.py:2372: in to_parquet
    **kwargs,
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/io/parquet.py:276: in to_parquet
    **kwargs,
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/io/parquet.py:101: in write
    table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
pyarrow/table.pxi:1394: in pyarrow.lib.Table.from_pandas
    ???
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:588: in dataframe_to_arrays
    for c, f in zip(columns_to_convert, convert_fields)]
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:588: in <listcomp>
    for c, f in zip(columns_to_convert, convert_fields)]
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:574: in convert_column
    raise e
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:568: in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
pyarrow/array.pxi:292: in pyarrow.lib.array
    ???
pyarrow/array.pxi:79: in pyarrow.lib._ndarray_to_array
    ???
pyarrow/array.pxi:67: in pyarrow.lib._ndarray_to_type
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column label with type int64')

pyarrow/error.pxi:107: ArrowTypeError
----------------------------- Captured stdout call -----------------------------
Downloading superglue to "/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmpdm5pl_ev". Selected tasks = copa
Processing copa...
Downloading /Users/runner/.mxnet/datasets/nlp/glue/superglue/copa.zip from https://dl.fbaipublicfiles.com/glue/superglue/data/v2/COPA.zip...
----------------------------- Captured stderr call -----------------------------

  0%|          | 0.00/44.0k [00:00<?, ?iB/s]
100%|██████████| 44.0k/44.0k [00:00<00:00, 535kiB/s]

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

Fix numpy version to 1.19.4 in workflow. Efforts are needed to find the root cause.

sxjscience commented 3 years ago

Need to fix this.

sxjscience commented 3 years ago

We may add numpy dependency in our setup.py and ping it to be smaller than 1.20.0.

leezu commented 3 years ago

That can only be a temporary solution. Please also reproduce the bug without gluonnlp and file a bur report upstream so the root cause can be addressed

sxjscience commented 3 years ago

Similarly, we have also triggered one bug of wikiextractor in which we should report to their repo.

sxjscience commented 3 years ago

Should we close this? @barry-jin

barry-jin commented 3 years ago

Yes, Let's close this issue first. I will find the root cause and report to pyarrow. After issues solved, we can revert #1456