Open einsone opened 1 year ago
Package kind : python-wheel-manylinux2014
Arrow C++ library version : 12.0.0
Arrow C++ compiler : GNU 10.2.1
Arrow C++ compiler flags : -fdiagnostics-color=always
Arrow C++ git revision :
Arrow C++ git description :
Arrow C++ build type : release
Platform:
OS / Arch : Linux x86_64
SIMD Level : avx2
Detected SIMD Level : avx2
Memory: Default backend : jemalloc Bytes allocated : 0 bytes Max memory : 0 bytes Supported Backends : jemalloc, mimalloc, system
Optional modules:
csv : Enabled
cuda : -
dataset : Enabled
feather : Enabled
flight : Enabled
fs : Enabled
gandiva : -
json : Enabled
orc : Enabled
parquet : Enabled
Filesystems: GcsFileSystem : Enabled HadoopFileSystem : Enabled S3FileSystem : Enabled
Compression Codecs: brotli : Enabled bz2 : Enabled gzip : Enabled lz4_frame : Enabled lz4 : Enabled snappy : Enabled zstd : Enabled
AFAIK, there is no way for Arrow to consistently determine the correct order. In Arrow, columns are allowed to have duplicate names so something like this would be allowed:
tab1 = pa.Table.from_pydict({
"col": [1,2,3,4,5],
"col": [6, 7, 8, 9, 10],
})
tab2 = pa.Table.from_pydict({
"col": [6, 7, 8, 9,10],
"col": [1,2,3,4,5],
})
Two tables with different schemas can't be combined. You will need to normalize the schema in your code (or perhaps pandas) before providing it to Arrow.
Describe the usage question you have. Please include as many useful details as possible.
why different columns order result in different schema?
the following code raise:
pyarrow.lib.ArrowInvalid: Schema at index 1 was different:
Component(s)
C++, Python