Esri / arcgis-python-api

Documentation and samples for ArcGIS API for Python
https://developers.arcgis.com/python/
Apache License 2.0
1.87k stars 1.1k forks source link

`to_featureset` does not preserve datetime #1301

Closed cgpeltier closed 2 years ago

cgpeltier commented 2 years ago

Describe the bug When attempting to append data by converting features in a spatially enabled dataframe to a featureset (using to_featureset()) and then edit_features to update an AGOL layer, datetime attrributes are converted to unix timestamps. If an AGOL layer contains date columns, then that data is not appended to the hosted feature service.

The code below is connected to another github issue.

To Reproduce Steps to reproduce the behavior:

## pull layer to update
item_id = ## item_id
to_update = gis.content.get(itemid=item_id) 
to_update_layer = to_update.layers[0]

## add new data
format = "%Y-%m-%d %H:%M:%S"

df = pd.DataFrame([
    {'test' : 'update4',
     'longitude' : -117.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
    },
     {'test' : 'update5',
     'longitude' : -115.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
     },
     {'test' : 'update6',
     'longitude' : -117.15246191281217,
     'latitude' : 34.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
     },
    ])

## create timezone cols
df['date_datetime'] = pd.to_datetime(df['date'])
df['date_localized_utc'] = df['date_datetime'].dt.tz_localize('UTC')
df['date_localized_est'] = df['date_localized_utc'].dt.tz_convert('US/Eastern')
df['date_localized_pst'] = df['date_localized_utc'].dt.tz_convert('US/Pacific')
df['date_datetime_epoch'] = df['date_datetime'].map(pd.Timestamp.timestamp)

update_with = pd.DataFrame.spatial.from_xy(df=df, x_column='longitude', y_column='latitude', sr=4326)
print(update_with.dtypes)
print(list(update_with.dropna(how='all', axis=1).spatial.to_featureset()))

## update data
results = []
for i, row in update_with.iterrows():
    try: 
        featureset = update_with.iloc[i:i+1].dropna(how='all', axis=1).spatial.to_featureset()
        result = to_update.layers[0].edit_features(adds=featureset)
        results.append(result)
    except:
        print(f'Feature {i} not added')
results

Screenshots image

Expected behavior Datetime columns are maintained as dates and appended to AGOL successfully (this just may not be possible via to_featureset / edit_features; if so, please let me know!šŸ˜„).

Platform (please complete the following information):

nanaeaubry commented 2 years ago

Hi @cgpeltier !

I ran using the code in the other github issue you posted and this is my result from that:

image

I then added this code above and got an error related to the dtypes and this seems to come from the arcpy side as stated in the first issue, so we will wait and see what they say.

Once they reply I'll test this again if there is a fix and let you know what happens

cgpeltier commented 2 years ago

Thanks Nanae! Sounds good.

nanaeaubry commented 2 years ago

@cgpeltier I am taking another look at this and it's confirmed that arcpy doesn't support timezones yet so that is one issue.

However, it seems the second issue it that the updates are not being added correctly, as in the columns have empty data? I noticed the column names differ in this df compared to your code in your other issue, that can be an issue if the features cannot match up.

Another workaround is to create a filegdb from the dataframe and use the append method on the feature layer that exists and pass in the filegdb :

fs_item.layers[fl_index].append(
                            item_id=fgdb_item.id, upload_format="filegdb"
                        )

Let me know if this works for you

cgpeltier commented 2 years ago

Hi @nanaeaubry, thanks for the followup.

On the arcpy timezone part -- thanks for the update! šŸ˜ So I'm guessing if/when arcpy was updated to handle timezones, then we'd be able to convert our SEDF datetime columns to UTC, push to AGOL, and then have the datestimes stay the same?

On the column name issue -- good catch; when I use sedf.spatial.to_featurelayer('agol_dt_test_{}'.format(now), gis=gis), the column names in AGOL are different (truncated) from my SEDF. I don't think the column name truncation happens when I instead convert the update data into a featureset, which would explain the blanks for those columns when I update via edit_features. Are the column names also truncated when you run it on your side? And is that expected?

Btw, in case it's helpful, here's my full code:

import pandas as pd
from datetime import date, timedelta
from datetime import datetime
import requests
from pytz import timezone

import arcgis
from arcgis.gis import GIS
gis = GIS()
from arcgis.features import GeoAccessor
print(arcgis.__version__)

format = "%Y-%m-%d %H:%M:%S"
today = datetime.today().strftime('%Y-%m-%d')
now = datetime.now().strftime(format)

df = pd.DataFrame([
    {'test' : 'test1',
     'longitude' : -117.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : str(today)},
     {'test' : 'test2',
     'longitude' : -115.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : str(today)},
     {'test' : 'test3',
     'longitude' : -117.15246191281217,
     'latitude' : 34.70969794547845,
     'date' : str(today)},
    ])

## create timezone cols
df['date_datetime'] = pd.to_datetime(df['date'])
df['date_datetime_utc'] = pd.to_datetime(df['date'], utc=True)
df['date_loc_utc'] = df['date_datetime'].dt.tz_localize('UTC')
df['date_loc_est'] = df['date_loc_utc'].dt.tz_convert('US/Eastern')
df['date_loc_pst'] = df['date_loc_utc'].dt.tz_convert('US/Pacific')

sedf = pd.DataFrame.spatial.from_xy(df=df, x_column='longitude', y_column='latitude', sr=4326)

## push to AGOL
layer1 = sedf.spatial.to_featurelayer('agol_dt_test_{}'.format(now), gis=gis)

############################
# Update layer using edit_features
############################

## pull layer to update
item_id = layer1.id
to_update = gis.content.get(itemid = item_id) 
to_update_layer = to_update.layers[0]

## add new data
format = "%Y-%m-%d %H:%M:%S"

df = pd.DataFrame([
    {'test' : 'update4',
     'longitude' : -117.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
    },
     {'test' : 'update5',
     'longitude' : -115.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
     },
     {'test' : 'update6',
     'longitude' : -117.15246191281217,
     'latitude' : 34.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
     },
    ])

## create timezone cols
df['date_datetime'] = pd.to_datetime(df['date'])
df['date_localized_utc'] = df['date_datetime'].dt.tz_localize('UTC')
df['date_localized_est'] = df['date_localized_utc'].dt.tz_convert('US/Eastern')
df['date_localized_pst'] = df['date_localized_utc'].dt.tz_convert('US/Pacific')
df['date_datetime_epoch'] = df['date_datetime'].map(pd.Timestamp.timestamp)

update_with = pd.DataFrame.spatial.from_xy(df=df, x_column='longitude', y_column='latitude', sr=4326)
update_with

## look at feature set
list(update_with.dropna(how='all', axis=1).spatial.to_featureset())

## update data

results = []
for i, row in update_with.iterrows():
    try: 
        featureset = update_with.iloc[i:i+1].dropna(how='all', axis=1).spatial.to_featureset()
        result = to_update.layers[0].edit_features(adds=featureset)
        results.append(result)
    except:
        print(f'Feature {i} not added')
results

############################
# Update data using append
############################

## add new data
format = "%Y-%m-%d %H:%M:%S"
today = datetime.today().strftime('%Y-%m-%d')

df = pd.DataFrame([
    {'test' : 'update1',
     'longitude' : -117.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
     },
     {'test' : 'update2',
     'longitude' : -115.15246191281217,
     'latitude' : 32.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
     },
     {'test' : 'update3',
     'longitude' : -117.15246191281217,
     'latitude' : 34.70969794547845,
     'date' : today, #str(datetime.now().strftime(format))
     },
    ])

## create timezone cols
df['date_datetime'] = pd.to_datetime(df['date'])
df['date_datetime_utc'] = pd.to_datetime(df['date'], utc=True)
df['date_loc_utc'] = df['date_datetime'].dt.tz_localize('UTC')
df['date_loc_est'] = df['date_loc_utc'].dt.tz_convert('US/Eastern')
df['date_loc_pst'] = df['date_loc_utc'].dt.tz_convert('US/Pacific')

update_with = pd.DataFrame.spatial.from_xy(df=df, x_column='longitude', y_column='latitude', sr=4326)
update_with
## add new SEDF to AGOL as a feature layer
ofm_update = update_with.spatial.to_featurelayer('agol_test_update_{}'.format(now), gis = gis)
print(ofm_update.id)

## save new layer to AGOL as GDB 
ofm_update2 = ofm_update.export('test', 'File Geodatabase')
print(ofm_update2.id)

## append to existing feature layer
## get target OFM layer to update
to_update = gis.content.get(itemid=layer1.id) 
print(to_update.id)

to_update.layers[0].append(
    item_id = ofm_update2.id,
    upload_format = 'filegdb', 
    upsert = False,
)

ofm_update.delete()
ofm_update2.delete()
nanaeaubry commented 2 years ago

Yes the arcpy timezone is in the near term road map (we can post in the other issue when we know exact release). But once that is implemented, there shouldn't be issues with what you are doing.

I am not sure what version of the api you are using but there is a parameter in to_featurelayer called sanitize_columns and it used to be default True but now it is False. That is my best guess for the column truncation.

There are three things you can try to solve that:

  1. See if you can get the columns to match and try the original method. Look into the sanitize_columns parameter
  2. The code you are using above can skip the export part for the fgdb because when you do to_featurelayer it will create a feature layer and a filegdb as an item already. You can use: related_items("Service2Data"). It will return a list of related items (there should only be one and it is that filegdb item). See example below.
  3. Do the to_featurelayer and use the query() method to get the features you need and then use the edit_features method. See example below.

Here are short examples based off your code above for 2 & 3:

2

## add new SEDF to AGOL as a feature layer
ofm_update = update_with.spatial.to_featurelayer('agol_test_update_{}'.format(now), gis = gis)
print(ofm_update.id)

## Get related item
ofm_update2 =  ofm_update.related_items("Service2Data")
print(ofm_update2.id)

## append to existing feature layer
## get target OFM layer to update
to_update = gis.content.get(itemid=layer1.id) 
print(to_update.id)

to_update.layers[0].append(
    item_id = ofm_update2.id,
    upload_format = 'filegdb', 
    upsert = False,
)

3

## add new SEDF to AGOL as a feature layer
ofm_update = update_with.spatial.to_featurelayer('agol_test_update_{}'.format(now), gis = gis)
print(ofm_update.id)

# Get the list of features to update with
update_with = ofm_update.layers[0].query().features

# Perform edit features
to_update = gis.content.get(itemid=layer1.id) 

to_update.layers[0].edit_features(adds=features)

Hopefully one of these three solutions works! šŸ‘šŸ»

cgpeltier commented 2 years ago

Thanks again, Nanae.

I was running arcgis 2.0.0, but just updated to 2.0.1 to make sure that wasn't the issue.

On method 2: After running

## Get related item
ofm_update2 = ofm_update.related_items("Service2Data")

I then ran the append code above, subsetting the list that is returned. However, to_featurelayer created a shapefile instead of a fgdb. Just flagging that in case that's unexpected. I can just use ofm_update2[0].type to manually check the type before running append, so it shouldn't be an issue in that respect. But thanks for the FYI, good to not have to export an extra FGDB needlessly.

if ofm_update2[0].type == 'Shapefile':
    upload_format = 'shapefile'
elif ofm_update2[0].type == 'File Geodatabase':
    upload_format = 'filegdb'

to_update.layers[0].append(
    item_id=ofm_update2[0].id,
    upload_format=upload_format, 
    upsert=False,
)

By the way, on the column names truncation -- I tried explicitly setting sanitize_columns=False but it still truncated the column names. Based on the fact that it's creating a shapefile instead of a fgdb, I'm guessing that's the issue there. Is it possible to control whether it creates the feature layer using a fgdb vs a shapefile?

Both methods work well though, thank you!

nanaeaubry commented 2 years ago

Great to hear that it works! Creating a shapefile occurs when arcpy is not in your environment. I assumed it was which is why I spoke about fgdb but if it is not then we use the Shapely library and create a shapefile instead so there is no workflow difference just the result is a Feature Layer and a Shapefile instead.

For the column truncation I am not sure why Shapefile does that. If I find anything out about it or a way around I will let you know.

I will go ahead and close this issue since we were able to resolve one problem and the other is linked to the other issue you have open. You can still post here if needed but for timezone updates they will be on your other issue :) Thanks!

cgpeltier commented 2 years ago

Thanks, Nanae! I was using a conda environment for this; I'll ensure that is has arcpy for my AGOL updates now.

Ray-Mangan commented 1 year ago

For the column truncation I am not sure why Shapefile does that. If I find anything out about it or a way around I will let you know.

@nanaeaubry - I know this issue is closed but the column truncation issue is caused by the shapefile format. The format uses .dbf as the datatable storage format and has a 10 character limit. See the limitations section part way down the page here: https://en.wikipedia.org/wiki/Shapefile