Esri / arcgis-python-api

Documentation and samples for ArcGIS API for Python
https://developers.arcgis.com/python/
Apache License 2.0
1.9k stars 1.1k forks source link

GeoAccessor.from_df truncates rows when dataframe is longer than max batch size #1646

Closed cflann closed 1 year ago

cflann commented 1 year ago

Describe the bug The static method from_df only returns the final batch of results when putting dataframe addresses through a geocoder. For instance, if the geocoder has a max batch size of 100, and your dataframe as 127 rows (with addresses), the returned spatially enabled dataframe will only have 27 rows. The first 100 are silently dropped.

To Reproduce Steps to reproduce the behavior:

I've used the OR State geocoder below, but any geocoder service with a max batch size smaller than your input data will trigger the bug.

import arcgis
from arcgis.features import GeoAccessor
from arcgis.geocoding import Geocoder
import pandas as pd

# OR geocoder has a max batch size of 1000
geocoder = Geocoder('https://navigator.state.or.us/arcgis/rest/services/Locators/OregonAddress/GeocodeServer')

adds = pd.read_csv(r"<path_to>\or_addresses.csv")
print(len(adds)) # 1032

or_sdf = pd.DataFrame.spatial.from_df(
    adds,
    address_column='Address',
    geocoder=geocoder
)
print(len(or_sdf)) # 32 (should be 1032)

Looking at the code in my local install @ arcgis/features/geo/_accessor.py around line 2969, merge is being called on df. df is passed as a param (the full input dataframe), but then appears to be masked by the final iteration of the loop starting on line 2956. Thus, when merge is called, df only contains the final "piece" of the complete feature set.

Changing df in the loop to something like piece should avoid the naming conflict and fix the issue.

error: no error, just incorrect results.

Screenshots If applicable, add screenshots to help explain your problem.

Expected behavior The method should return a spatially enabled data frame with the same number of rows as the input.

Platform (please complete the following information):

Additional context Attached is a randomly generated list of Oregon addresses to use with the OR State geocoder. or_addresses.csv

cflann commented 1 year ago

Note: the addresses attached were generated by the tool at https://happycattools.com/fake-address-generators/oregon/. They are randomly generated and not meant to be real places, though some may end up corresponding to real addresses by simple chance.

achapkowski commented 1 year ago

@cflann thanks for submitting an issue, we'll take a look and get back to you when we have an update.

achapkowski commented 1 year ago

This will be fixed in v2.2.0 which is the next release. Thank you for reporting this.

cflann commented 1 year ago

Thanks Andrew!