Open PCClimate opened 1 year ago
Could you first verify the pyarrow version you are using (check pyarrow.__version__
)? Then could you also inspect the columns of the parquet file, you can use pyarrow for that (table=pq.read_table('example.parquet')
and then check the schema of the table object table.schema
).
I tried couple of examples using map type and none of them error:
>>> import pyarrow as pa
>>> data = [[{'key': 'a', 'value': "1"}, {'key': 'b', 'value': "2"}], [{'key': 'c', 'value': "3"}]]
>>> map_type = pa.map_(pa.string(), pa.string())
>>> table = pa.table([pa.array(data, type=map_type)], names=["array_element"])
>>> table.schema
array_element: map<string, string>
child 0, entries: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
>>> table.to_pandas()
array_element
0 [(a, 1), (b, 2)]
1 [(c, 3)]
>>> table = pa.table([pa.array(data, type=pa.list_(map_type))], names=["array_element"])
>>> table.schema
array_element: list<item: map<string, string>>
child 0, item: map<string, string>
child 0, entries: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
>>> table.to_pandas()
array_element
0 [[(key, a), (value, 1)], [(key, b), (value, 2)]]
1 [[(key, c), (value, 3)]]
>>> inner = pa.array(data, type=map_type)
>>> array = pa.MapArray.from_arrays([0, 2], ['a', 'b'], inner)
>>> table = pa.table({'array_element': array})
>>> table.schema
array_element: map<string, map<string, string>>
child 0, entries: struct<key: string not null, value: map<string, string>> not null
child 0, key: string not null
child 1, value: map<string, string>
child 0, entries: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
>>> table.to_pandas()
array_element
0 [(a, [('a', '1'), ('b', '2')]), (b, [('c', '3'...
Duplicate of https://github.com/apache/arrow/issues/12396
pyarrow version is 12.0.1.
This is the schema:
`id: string
updatetime: string
version: int32
names: map<string, list<array_element: map<string, string ('array_element')>> ('names')>
child 0, names: struct<key: string not null, value: list<array_element: map<string, string ('array_element')>>> not null
child 0, key: string not null
child 1, value: list<array_element: map<string, string ('array_element')>>
child 0, array_element: map<string, string ('array_element')>
child 0, array_element: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
categories: struct<main: string, alternate: list<array_element: string>>
child 0, main: string
child 1, alternate: list<array_element: string>
child 0, array_element: string
confidence: double
websites: list<array_element: string>
child 0, array_element: string
socials: list<array_element: string>
child 0, array_element: string
emails: list<array_element: string>
child 0, array_element: string
phones: list<array_element: string>
child 0, array_element: string
brand: struct<names: map<string, list<array_element: map<string, string ('array_element')>> ('names')>, wikidata: string>
child 0, names: map<string, list<array_element: map<string, string ('array_element')>> ('names')>
child 0, names: struct<key: string not null, value: list<array_element: map<string, string ('array_element')>>> not null
child 0, key: string not null
child 1, value: list<array_element: map<string, string ('array_element')>>
child 0, array_element: map<string, string ('array_element')>
child 0, array_element: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
child 1, wikidata: string
addresses: list<array_element: map<string, string ('array_element')>>
child 0, array_element: map<string, string ('array_element')>
child 0, array_element: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
sources: list<array_element: map<string, string ('array_element')>>
child 0, array_element: map<string, string ('array_element')>
child 0, array_element: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
bbox: struct<minx: double, maxx: double, miny: double, maxy: double>
child 0, minx: double
child 1, maxx: double
child 2, miny: double
child 3, maxy: double
geometry: binary
-- schema metadata --
writer.time.zone: 'UTC'`
table.to_pandas()
Returns:
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_66740\1919897604.py in <module>
----> 1 table.to_pandas()
~\Anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib._PandasConvertible.to_pandas()
~\Anaconda3\lib\site-packages\pyarrow\table.pxi in pyarrow.lib.Table._to_pandas()
~\Anaconda3\lib\site-packages\pyarrow\pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
818 _check_data_column_metadata_consistency(all_columns)
819 columns = _deserialize_column_index(table, all_columns, column_indexes)
--> 820 blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
821
822 axes = [columns, index]
~\Anaconda3\lib\site-packages\pyarrow\pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns)
1166 # Convert an arrow table to Block from the internal pandas API
1167 columns = block_table.column_names
-> 1168 result = pa.lib.table_to_blocks(options, block_table, categories,
1169 list(extension_columns.keys()))
1170 return [_reconstruct_block(item, columns, extension_columns)
~\Anaconda3\lib\site-packages\pyarrow\table.pxi in pyarrow.lib.table_to_blocks()
~\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, string ('array_element')>
Sorry, I can not seem to create an example to reproduce the issue.
Tried with dev version of pyarrow
:
(pyarrow-dev) alenkafrim@Alenkas-MacBook-Pro python % python
Python 3.10.10 (main, Feb 16 2023, 02:46:59) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> pa.__version__
'14.0.0.dev42+g1af709ff9.d20230830'
>>> data = [[{'key': 'a', 'value': "1"}, {'key': 'b', 'value': "2"}], [{'key': 'c', 'value': "3"}]]
>>> map_type = pa.map_(pa.string(), pa.string())
>>> inner_map = pa.array(data, type=map_type)
>>> inner_list = pa.ListArray.from_arrays([0, 1, 2], inner_map)
>>> array = pa.MapArray.from_arrays([0, 1, 2], ["First", "Second"], inner_list)
>>> table = pa.table({'array_element': array})
>>> table.schema
array_element: map<string, list<item: map<string, string>>>
child 0, entries: struct<key: string not null, value: list<item: map<string, string>>> not null
child 0, key: string not null
child 1, value: list<item: map<string, string>>
child 0, item: map<string, string>
child 0, entries: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
>>> table.to_pandas()
array_element
0 [(First, [[('a', '1'), ('b', '2')]])]
1 [(Second, [[('c', '3')]])]
>>> table = pa.table({'array_element': inner_list})
>>> table.schema
array_element: list<item: map<string, string>>
child 0, item: map<string, string>
child 0, entries: struct<key: string not null, value: string> not null
child 0, key: string not null
child 1, value: string
>>> table.to_pandas()
array_element
0 [[(a, 1), (b, 2)]]
1 [[(c, 3)]]
Can you check which column is giving you the error? Also, does the code above work for you?
The code you shared above does not work for me, I get the same error in both instances.
The columns names, brand, addresses, and sources each give the same error.
@AlenkaF Someone using version 10.x of pyarrow was able to reproduce the issue with the code you provided above.
Getting the following error when trying to pull in data from a parquet file, is this expected for the data structure, is there a workaround using Arrow?
ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, string ('array_element')>
Full error:
Component(s)
Python