Closed FDenker closed 3 years ago
That is strange. I am not able to reproduce the issue, both ways result in a GeoDataFrame with 65 rows. Can you try updating your environments? It may have been fixed along the way, I see that you have some outdated dependencies.
I have the same issue and can reproduce @FDenker 's issue. Package versions:
python : 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] executable : /usr/bin/python3 machine : Linux-5.4.0-80-generic-x86_64-with-Ubuntu-18.04-bionic
GEOS : 3.8.0 GEOS lib : /usr/lib/x86_64-linux-gnu/libgeos_c.so GDAL : 3.3.0 GDAL data dir: /home/nguyenlienviet/.local/lib/python3.6/site-packages/fiona/gdal_data PROJ : 7.2.1 PROJ data dir: /home/nguyenlienviet/.local/lib/python3.6/site-packages/pyproj/proj_dir/share/proj
geopandas : 0.9.0 pandas : 1.1.5 fiona : 1.8.20 numpy : 1.19.5 shapely : 1.7.1 rtree : 0.9.7 pyproj : 3.0.1 matplotlib : 3.2.2 mapclassify: 2.4.3 geopy : 2.2.0 psycopg2 : 2.9.1 (dt dec pq3 ext lo64) geoalchemy2: 0.9.3 pyarrow : 5.0.0 pygeos : 0.10.1
I tried to reproduce the issue on macOS and Ubuntu to no avail. It works every time, no matter what I do and how do I set the environment... I'd love to help but not sure how.
@FDenker @nguyenlienviet can you export your environment to yml via conda env export -f environment.yml
and share that?
@FDenker @nguyenlienviet This issue is arising because of using GDAL's geojson driver to read the file (as compared to the from_features
route that you showed working as expected). There's an environment variable, OGR_GEOJSON_MAX_OBJ_SIZE
, that sets the maximum size of individual features (https://gdal.org/drivers/vector/geojson.html). Some of the features in the dataset you have here are sufficiently complex that they're bumping up against whatever that is set to on your system. I'm able to get the behavior you experience by setting that environment variable to a lower value. For you, this should work:
import geopandas as gpd
import fiona
url = "https://github.com/FDenker/GeoPandas-Geojson-Issue/raw/main/geopandas_not_found.geojson"
with fiona.Env(OGR_GEOJSON_MAX_OBJ_SIZE=2000):
no_longer_empty_gdf = gpd.read_file(url)
I won't tell you how long it took me to get to the bottom of this one. :sweat_smile:
@jdmcbr Thanks a lot for getting to the bottom of this!
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of geopandas.
[x] (optional) I have confirmed this bug exists on the master branch of geopandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
When reading in a specific kind of GeoJSON (output of an osmium-tool export to be exact) the
read_file
function skips over specific elements. However, it does not return an error but rather an empty GeoDataFrame. It is important to mention that this only occurs for a low number of entries and the GeoJSON I have linked above only includes entries in which theread_file
function does not work. When I normally import GeoJSON files that are exported from the osmium-tool about 99 % of the entries are reflected in the GeoDataFrame.At the same time, if I load in the GeoJSON as simple JSON and then pass the 'features' to the
from_features
function it returns proper GeoDataFrame with all the data that is in the GeoJSON.The error persists both on my local windows machine (running python 3.8.3) and on an Ubuntu 18.04 machine (running 3.7.10 and the GitHub version of the geopandas). I have therefore also posted both system info below.
Expected Output
GeoDataFrame with 65 rows containing attributes and valid geometries.
Output of
geopandas.show_versions()
Windows machine:
Linux machine: