Esri / arcgis-python-api

Documentation and samples for ArcGIS API for Python
https://developers.arcgis.com/python/
Apache License 2.0
1.89k stars 1.1k forks source link

Reading hosted table in AGOL using .query() returns a dataframe with exactly 1000 less records than are in AGOL hosted table #2139

Open theisenm12 opened 5 days ago

theisenm12 commented 5 days ago

I am trying to read a hosted AGOL table using the script below (.query). When the script runs, saves a csv and then publishes the table, it has exactly 1000 less records than the original hosted table. I don't have any other queries on the table and I don't think there are settings within the hosted table that would cause this. I am using this to compare the existing hosted table to a new table of AGOL members information so I can create an updated dashboard with removed accounts still present (hence, comparing the two tables, dataframes in this case).

Below is the code piece being used: Existing item ID item_id = 'xyz'

Load the existing table from AGOL to check for removed members existing_table_item = gis.content.get(item_id)

Initialize DataFrame either from the existing table or as an empty DataFrame with the same columns as df existing_table_df = (existing_table_item.tables[0].query().df if existing_table_item else pd.DataFrame(columns=df.columns))

Debugging: Check the columns of existing_table_df print("Columns in existing_table_df:", existing_table_df.columns.tolist())

Save the DataFrame to a CSV File file_SaveName = 'Test_Export_Existing_Table_DF.csv' existing_table_df.to_csv(file_SaveName, index=False)

Define metadata for the CSV file csv_properties = { 'title': 'Test Export Existing Table DF', 'type': 'CSV', 'tags': 'AGOL Export, CSV', 'description': 'CSV export of the existing table from AGOL', }

Upload the CSV file to AGOL content csv_item = gis.content.add(item_properties=csv_properties, data=file_SaveName)

For what its worth, the following code has the same result: Access the previous values table table_url = "xyz" previous_values_table = Table(table_url)

Retrieve existing values from the previous values table existing_records = previous_values_table.query(where="1=1", out_fields="*").features

Convert the list of features to a list of dictionaries (attributes) records_dict_list = [feature.attributes for feature in existing_records]

Convert the list of dictionaries to a pandas DataFrame existing_records_df = pd.DataFrame(records_dict_list)

Save the DataFrame to a CSV file file_SaveName = 'Test_Export_Existing_Table_DF_20241022.csv' existing_records_df.to_csv(file_SaveName, index=False)

Define metadata for the CSV file csv_properties = { 'title': 'Test Export Existing Table DF 20241022', 'type': 'CSV', 'tags': 'AGOL Export, CSV', 'description': 'CSV export of the existing table from AGOL', }

Upload the CSV file to AGOL content csv_item = gis.content.add(item_properties=csv_properties, data=file_SaveName)

error:
There is no error message, 

Expected behavior I would expect this piece of code to read the entire hosted table and create a dataFrame consisting of all the records

Platform (please complete the following information):

nanaeaubry commented 11 hours ago

@theisenm12 I will test this as well but can you: make sure you are using the latest version (2.4.0) and instead of the Content Manager add method you can use the Folder add method.

So instead of: gis.content.add(...) use folder.add(...) this is because Content Manager add has been deprecated in favor of the Folder add.

If you want to add to the root folder you can get this folder by doing: folder = gis.content.folders.get(). If you want a specific folder then you can specify the folder name in the get method.

Please let us know if you can reproduce the issue with these conditions

theisenm12 commented 6 hours ago

@nanaeaubry Thank you for helping out and for letting me know gis.content.add() has been deprecated.

I am using AGOL Notebook so I thought it should be automatically updated to whatever the latest version is but if not, please let me know how I could go about updating. I am fairly new to this system.

Do I still put the item ID in the parenthesis?

nanaeaubry commented 6 hours ago

@theisenm12 ArcGIS Online is currently a version behind and should be updating before the end of the year so just hold tight and it will do it on it's own.

If you want to already test the latest version you can do a local install and use local notebooks or an IDE to run your script. Here is some information if that ever interests you: https://developers.arcgis.com/python/latest/guide/install-and-set-up/anaconda/

This is the part of the code that will be deprecated and I have put the updated code below it:

# Upload the CSV file to AGOL content
**csv_item = gis.content.add(item_properties=csv_properties, data=file_SaveName)**
root_folder = gis.content.folders.get()
csv_item = root_folder.add(csv_properties, file=file_SaveName).result()

In the meantime I will try to test and let you know if I find anything

theisenm12 commented 6 hours ago

@nanaeaubry Thank you! I appreciate it greatly. A coworker also had this problem but he got 1000 extra duplicate features while using this .query function. Not sure how or why. His script still works though so not going to push that too much.

nanaeaubry commented 5 hours ago

@theisenm12 I have tried your script above with 2.4.0 and cannot reproduce the issue.

I started with this amount of features: image

When getting the dataframe using query I have: image

And then when adding and publishing the data back to AGO I get: image

We did do some work on the query method for 2.4.0 so it might have been fixed through that work. For your coworker getting duplicates that is also odd! I will leave this issue open but encourage you to try with 2.4.0 when you can (either local or when Online releases it). If you continue getting the same error it would be helpful to get some sample data we can test with.

If you cannot provide the data through github then we would have to go through Esri Support. Ill make sure to post back here if I find anything else.

theisenm12 commented 5 hours ago

@nanaeaubry That's great you are not getting the same issue. I hope it is 2.4.0. I will get with my IT team this week to get the newest release installed (I don't have permissions on my computer).

Do you know why AGOL is so far behind updating to the newest API version?

I won't be able to share my data as it is sensitive. I have a current ticket with ESRI support where they are looking into it as well but I know there is good support here as well so I wanted to spread it out.

nanaeaubry commented 5 hours ago

@theisenm12 Yes let's leave this open until you can test it on your end.

AGOL is 'behind' with the Python API because we follow the same release as ArcGIS Enterprise and ArcGIS Pro. ArcGIS Online has it's separate release schedule since they are less bound by versioning.

theisenm12 commented 5 hours ago

@nanaeaubry So if I am using ArcGIS Online Notebook, it is on the same schedule as Enterprise and Pro?

nanaeaubry commented 5 hours ago

@theisenm12 No, ArcGIS Online Notebooks are released with ArcGIS Online and ArcGIS Enterprise Notebooks are released with ArcGIS Enterprise and Pro Notebooks are released with Pro. Lots of teams to make the magic happen :)

The only way you can control the version you are using yourself is by setting up a local environment with the Python API and running it in local notebooks. Otherwise the version of the Python API will depend on the ArcGIS environment you are in.

Soon as ArcGIS Online updates then the python API will update in Online Notebooks as well.

theisenm12 commented 5 hours ago

@nanaeaubry Gotcha. I will get my version updated locally and then alter my script to run it local versus from AGOL Notebooks.