Open veenstrajelmer opened 1 month ago
Thanks for the report! I can confirm that with the new default IO engine pyogrio, this indeed returns a string.
A workaround is to use the old engine that was default pre 1.0.
uhslc_gpd = gpd.read_file("https://uhslc.soest.hawaii.edu/data/meta.geojson", engine="fiona")
@brendan-ward will know more whether this is expected or something we need to process differently in pyogrio.
@martinfleis thanks a lot for this useful suggestion, this conveniently solves the issue I had at least on my side. However, the engine string seems to be case sensitive, so it should be engine='fiona'
.
I'll keep this open and move it to pyogrio as we may want to look into that there.
It looks like there is a field type OFSTJSON
that Fiona is using in this case to automatically convert to dict
, and on write, automatically convert dict
/ list
values when serializing.
On the Pyogrio side, we need to detect this subtype and carry through that info when deserializing / serializing fields. Serializing is likely to be harder because the numpy array dtype does not give us this info - so there may be a real performance penalty there (or we leave this the responsibility of the user).
For now, you could also manually parse applicable fields to dict
and still get the speedups of Pyogrio:
import json
uhslc_gpd = gpd.read_file("https://uhslc.soest.hawaii.edu/data/meta.geojson")
uhslc_gpd["rq_span"] = uhslc_gpd.rq_span.apply(json.loads)
Thanks for the suggestion. That would also work indeed, but "rq_span" is not the only field that requires conversion, so for my application I prefer the fiona approach for now.
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of geopandas.
[ ] (optional) I have confirmed this bug exists on the main branch of geopandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
The above code raises
"TypeError: string indices must be integers, not 'str'"
in geopandas>=1.0.0. For older versions the code runs successfully. The issue is that the column now contains strings with dicts instead of plain dicts. It seems that something goes wrong with the parsing of the geojson.Expected Output
A subset of the original column.
Output of
geopandas.show_versions()