InseeFrLab / pynsee

pynsee package contains tools to easily search and download french data from INSEE and IGN APIs
https://pynsee.readthedocs.io/en/latest/
MIT License
67 stars 8 forks source link

Add GeoPandas as dependency? #169

Open tfardet opened 1 year ago

tfardet commented 1 year ago

EDIT: given the current state of geopandas packaging via wheels, I propose adding it as a real dependency and removing GeoFrDataFrame


I see that geopandas was removed on purpose as a dependency from pynsee in from #6 Though I understand the rationale, I do wonder whether, in the long run, it would not make sense to make it an optional dependency such that GeoFrDataFrame is a subset of GeoDataFrame if geopandas is not available but is actually a GeoDataFrame if geopandas is available.

In terms of import, one would replace the import in pynsee/geodata/__init__.py by

try:
    from geopandas import GeoDataFrame as GeoFrDataFrame
except ImportError:
    from .GeoFrDataFrame import GeoFrDataFrame

This does not imply any breaking change as far as I can tell: we could keep the get_geom/translate/zoom functions that do not exist in geopandas (though it might be worth making a custom GeoFrSeries for improved compatibility).

Opinions on that?

hadrilec commented 1 year ago

So far I tried not to use geopandas as a dependency, as it is sometimes not easy to install. I am not really keen on having it as a dependency. What would be the advantages of having it as a dependency?

tfardet commented 1 year ago

Here it would be an optional dependency, i.e. it is not required to install pynsee, but if available, it is used. The advantage is that people that have geopandas installed directly get a GeoDataFrame with all its capabilities, without having to do the extra conversion step. It may seem somewhat minor but the extra maintenance cost is also zero for the minimal implementation.

This also comes from the fact that I would assume that most people getting a DataFrame with a "geometry" column expect it to be a GeoDataFrame or a subclass, and thus expect all the GeoDataFrame/GeoSeries methods to work. With this implementation, we do not break any expectations (people not expecting anything do not get less than they expected, people expecting the GeoDataFrame get it).

hadrilec commented 1 year ago

I would find it a bit confusing to have sometimes an output that is a geopandas.GeoDataFrame and sometimes an output that is pynsee.GeoFrDataFrame depending on the dependencies. Such feature would mean that if a bug happens it is not straightforward to know whether it is related to geopandas or not. As a maintainer I would prefer to not have to answer this question. I think the extra step is not a big one, at least it makes it explicit that we are not handling geopandas object.

In the long run, if this need is expressed by more users, why not considering it, but tbh for now I think there are already enough issues to tackle.

tgrandje commented 1 year ago

Hi @hadrilec , I'm not sure your statement about geopandas is still relevant. I found that geopandas installation was efficiently simplified due to the release of whl files for Fiona on pypi less than 6 months ago. It sure could be a pain before that (particularly on windows), but I don't think that's the case anymore.

The only problems I'm still encountering with geopandas are linked to conflicts between gdal (osgeo) and fiona (which is another GDAL binding for python). There are cases where gdal is mandatory (for instance, to activate geodjango) and depending on geopandas could indeed be problematic. (I still have to see the consequences of using both libraries...)

In any case, this is very specific and I'm not sure we need to concern ourselves with geopandas's installation anymore...

tfardet commented 1 month ago

I'm bumping this issue once more, pushing for the addition of geopandas because:

  1. it would give us a proper geographic object to return for geodata (which GeoFrDataFrame is not), so people can work with all geofunctions
  2. it would solve most of the issues associated to #204 and let us simplify the code
  3. it would simplify save_df and making it faster because we could pass the library instead and load geoparquet directly with geopandas, as GeoDataFrame, rather than having to load via pandas and then cast into another class.

@hadrilec could you please check again on the computer from the INSEE so we can move forward with this if you confirm there are no installation issues?

hadrilec commented 1 month ago

I will give it another try