apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.67k stars 3.56k forks source link

[Python][CI] Fix deprecation warnings in the pandas nightly build #36412

Open jorisvandenbossche opened 1 year ago

jorisvandenbossche commented 1 year ago

The pandas nightly/upstream_devel builds show some warnings from pandas that we should address:

=============================== warnings summary ===============================
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_dataset.py::test_make_fragment
  /opt/conda/envs/arrow/lib/python3.10/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
    return bound(*args, **kwds)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_dataset.py::test_legacy_write_to_dataset_drops_null
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/parquet/core.py:3471: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
    for keys, subgroup in data_df.groupby(partition_keys):

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_feather.py::test_strings[1]
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_feather.py::test_strings[1]
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_feather.py::test_strings[2]
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_feather.py::test_strings[2]
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_feather.py:107: FutureWarning: Mismatched null-like values None and nan found. In a future version, pandas equality-testing functions (e.g. assert_frame_equal) will consider these not-matching and raise.
    assert_frame_equal(result, expected)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertStringLikeTypes::test_pandas_unicode
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertStringLikeTypes::test_bytes_to_binary
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:102: FutureWarning: Mismatched null-like values None and nan found. In a future version, pandas equality-testing functions (e.g. assert_frame_equal) will consider these not-matching and raise.
    tm.assert_frame_equal(result, expected, check_dtype=check_dtype,

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertListTypes::test_infer_lists
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:102: FutureWarning: Mismatched null-like values nan and None found. In a future version, pandas equality-testing functions (e.g. assert_frame_equal) will consider these not-matching and raise.
    tm.assert_frame_equal(result, expected, check_dtype=check_dtype,

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_strided_data_import
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_strided_data_import
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:137: FutureWarning: Mismatched null-like values None and nan found. In a future version, pandas equality-testing functions (e.g. assert_frame_equal) will consider these not-matching and raise.
    tm.assert_series_equal(pd.Series(result), expected, check_names=False)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_to_pandas_split_blocks
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:3565: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert len(x._data.blocks) == number

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_convert_to_extension_array
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4070: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert not isinstance(result._data.blocks[0], _int.ExtensionBlock)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_convert_to_extension_array
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4071: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert result._data.blocks[0].values.dtype == np.dtype("int64")

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_convert_to_extension_array
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4072: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert isinstance(result._data.blocks[1], _int.ExtensionBlock)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_convert_to_extension_array
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4079: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert isinstance(result._data.blocks[0], _int.ExtensionBlock)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_convert_to_extension_array
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4091: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert len(result._data.blocks) == 1

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_convert_to_extension_array
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4092: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert not isinstance(result._data.blocks[0], _int.ExtensionBlock)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_conversion_extensiontype_to_extensionarray
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4118: FutureWarning: Series._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert isinstance(result._data.blocks[0], _int.ExtensionBlock)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_conversion_extensiontype_to_extensionarray
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4123: FutureWarning: DataFrame._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert isinstance(result._data.blocks[0], _int.ExtensionBlock)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::test_conversion_extensiontype_to_extensionarray
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:4137: FutureWarning: Series._data is deprecated and will be removed in a future version. Use public APIs instead.
    assert not isinstance(result._data.blocks[0], _int.ExtensionBlock)

From https://github.com/ursacomputing/crossbow/actions/runs/5418026924/jobs/9849676474

jorisvandenbossche commented 1 year ago

Some that are still left:

=============================== warnings summary ===============================
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_dataset.py::test_make_fragment
  /opt/conda/envs/arrow/lib/python3.10/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
    return bound(*args, **kwds)

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:102: FutureWarning: Mismatched null-like values None and nan found. In a future version, pandas equality-testing functions (e.g. assert_frame_equal) will consider these not-matching and raise.
    tm.assert_frame_equal(result, expected, check_dtype=check_dtype,

opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py::TestConvertMisc::test_category
  /opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_pandas.py:138: FutureWarning: Mismatched null-like values None and nan found. In a future version, pandas equality-testing functions (e.g. assert_frame_equal) will consider these not-matching and raise.
    tm.assert_series_equal(pd.Series(result), expected, check_names=False)
nabelekt commented 1 year ago

I am doing some dataframe manipulation using pandas and seeing FutureWarning: 'DataFrame.swapaxes' as well:

/usr/local/lib/python3.11/dist-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)

versions: numpy (1.26.1) pandas (2.1.2) arrow (1.3.0) Python 3.11.5

jorisvandenbossche commented 5 months ago

Currently, the only remaining warnings in the pandas nightly builds are :

opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py::TestConvertMetadata::test_empty_list_metadata
opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py::TestConvertListTypes::test_empty_list_roundtrip
  /opt/conda/envs/arrow/lib/python3.11/site-packages/pandas/core/dtypes/missing.py:503: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
    return lib.array_equivalent_object(left, right)

opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py: 2 warnings
opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/parquet/test_data_types.py: 3 warnings
opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/parquet/test_pandas.py: 2 warnings
opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/parquet/test_parquet_file.py: 6 warnings
  /opt/conda/envs/arrow/lib/python3.11/site-packages/pandas/core/dtypes/missing.py:504: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
    if not lib.array_equivalent_object(left[~mask], right[~mask]):

I assume those warnings are actually coming from numpy, but are raised in the pandas function we use (would have to look in more detail to see if this is an issue that should be fixed in pandas, or in how we use the pandas function)

jorisvandenbossche commented 2 months ago

Opened an issue on the pandas side about this: https://github.com/pandas-dev/pandas/issues/59776