Open alanhdu opened 3 months ago
@alanhdu thanks for the issue. I think this is indeed something that from_pylist
could support, and a good feature request.
In that case, the schema
argument should probably be required (or it could be relaxed to just a list of column names
).
The current implementation lives here:
(it's in a cython file, but essentially it's just pure python in this case)
I think it should be relatively straightforward to edit that to also support tuples (or to have a variant that supports tuples).
take
For now, we can regard the from_pydict
as a method for creating Table
with column-like data, since its input is a mapping of field to array. We also have from_pylist
which can create Table
with row-like data. But from_pylist
requires that data and field are bounded, because it takes a dict as a row.
Seems we need to provide an api to support creating table with seperate data and schema. I initialy plan to add a new method, eg from_pytuple
, to support this issue, but this method can also process list data, not only tuple. So the method name will be consufing. Now I decide to edit the from_pylist
to support this. Any suggestion on this? thx! @jorisvandenbossche
Describe the enhancement requested
I have a function that returns an iterator-of-tuples and would like to turn that into pyarrow table. I have the column names separately, would like to use the PyArrow's type inference for the actual types.
I can sort of get what I want with something like:
But this doesn't quite work, since Pandas will cast nullable integers to floats. I can obviousl also do this "manually" (e.g. via
pa.Table.from_pylist([dict(zip(column_names, row)) for row in rows])
or something), but I'm wondering if there's a faster way to do this.Component(s)
Python