Closed ingenieroariel closed 11 months ago
I also tried:
points = ga.point().from_geobuffers(None, table["latitude"].combine_chunks(), y=table["longitude"].combine_chunks())
and got
TypeError Traceback (most recent call last) Cell In[16], line 1 ----> 1 points = ga.point().from_geobuffers(None, table["latitude"].combine_chunks(), y=table["longitude"].combine_chunks())
File ~/tmp/lib/python3.11/site-packages/geoarrow/pyarrow/_type.py:289, in PointType.from_geobuffers(self, validity, x, y, z_or_m, m) 280 def from_geobuffers(self, validity, x, y=None, z_or_m=None, m=None): 281 buffers = [ 282 (0, "uint8", validity), 283 (1, "double", x), (...) 286 (4, "double", m), 287 ] --> 289 return self._from_geobuffers_internal(buffers)
File ~/tmp/lib/python3.11/site-packages/geoarrow/pyarrow/_type.py:94, in GeometryExtensionType._from_geobuffers_internal(self, args) 92 continue 93 else: ---> 94 builder.set_buffer_double(i, buf) 96 carray = builder.finish() 97 return pa.Array._import_from_c(carray._addr(), self)
File src/geoarrow/c/_lib.pyx:674, in geoarrow.c._lib.CBuilder.set_buffer_double()
TypeError: memoryview: a bytes-like object is required, not 'pyarrow.lib.DoubleArray'
That's a great point! I don't think we have shortcut for point creation from chunked arrays yet. The workaround is:
import pyarrow as pa
import geoarrow.pyarrow as ga
tbl = pa.table([pa.array([0.0, 1.0]), pa.array([1.0, 2.0])], names=["x", "y"])
struct_chunks = []
for x_chunk, y_chunk in zip(tbl["x"].chunks, tbl["y"].chunks):
struct_chunk = pa.StructArray.from_arrays([x_chunk, y_chunk], names=["x", "y"])
struct_chunks.append(struct_chunk)
points = ga.point().wrap_array(pa.chunked_array(struct_chunks))
points
#> <pyarrow.lib.ChunkedArray object at 0x1247f1a80>
points.type
#> PointType(geoarrow.point)
It's not only chunked arrays, but pyarrow arrays in general that don't work for from_geobuffers
. Another workaround for now is to convert each column to a numpy array:
points = ga.point().from_geobuffers(None, table["latitude"].to_numpy(), y=table["longitude"].to_numpy())
BTW, note that you should switch around the order of latitude and longitude! (geoarrow always uses x/y or lon/lat order, regardless of the coordinate reference system)
That's a great point! I don't think we have shortcut for point creation from chunked arrays yet. The workaround is:
import pyarrow as pa import geoarrow.pyarrow as ga tbl = pa.table([pa.array([0.0, 1.0]), pa.array([1.0, 2.0])], names=["x", "y"]) struct_chunks = [] for x_chunk, y_chunk in zip(tbl["x"].chunks, tbl["y"].chunks): struct_chunk = pa.StructArray.from_arrays([x_chunk, y_chunk], names=["x", "y"]) struct_chunks.append(struct_chunk) points = ga.point().wrap_array(pa.chunked_array(struct_chunks)) points #> <pyarrow.lib.ChunkedArray object at 0x1247f1a80> points.type #> PointType(geoarrow.point)
This workaround worked for me, and it was super fast, this tech is magic.
Reopening because we should really have this helper in geoarrow.pyarrow!
I am trying to load a csv to a geoarrow table manually using pyarrow but got an error