Esri / arcgis-python-api

Documentation and samples for ArcGIS API for Python
https://developers.arcgis.com/python/
Apache License 2.0
1.91k stars 1.11k forks source link

Error in spatial.to_featurelayer: Temp zip file being used by another process #2054

Open roelofsaj opened 2 months ago

roelofsaj commented 2 months ago

Describe the bug When using spatial.to_featurelayer on a moderately large (62,000 rows, 93 columns) spatially enabled data frame, I'm getting the following error, and the feature layer is never created.

C:\Users\aheinlei\work\mi-flora-scripting\to_featurelayer_bug_demo.py:9: DtypeWarning: Columns (16,17,19,20,21,22,23,26,30,31,32,52,54,57,61) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv('demo_data_62k.csv')
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2023.3.2\plugins\python\helpers\pydev\pydevd.py", line 1534, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\JetBrains\PyCharm 2023.3.2\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\Users\aheinlei\work\mi-flora-scripting\to_featurelayer_bug_demo.py", line 109, in <module>
    fl = spatial_df.spatial.to_featurelayer(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aheinlei\AppData\Local\ESRI\conda\envs\miflora_3_3_1\Lib\site-packages\arcgis\features\geo\_accessor.py", line 2912, in to_featurelayer
    result = content.import_data(
             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aheinlei\AppData\Local\ESRI\conda\envs\miflora_3_3_1\Lib\site-packages\arcgis\gis\__init__.py", line 8591, in import_data
    return _cm_helper.import_as_item(self._gis, df, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aheinlei\AppData\Local\ESRI\conda\envs\miflora_3_3_1\Lib\site-packages\arcgis\gis\_impl\_content_manager\_import_data.py", line 247, in import_as_item
    file_item, new_item = _create_file_item(gis, df, file_type, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aheinlei\AppData\Local\ESRI\conda\envs\miflora_3_3_1\Lib\site-packages\arcgis\gis\_impl\_content_manager\_import_data.py", line 156, in _create_file_item
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\aheinlei\\AppData\\Local\\Temp\\a694de98\\a86051.zip'
python-BaseException

To Reproduce Run the python script atteched below (file extension changed to .txt to allow uploading). I'd rather not post the data file here publicly, so if someone from ESRI takes a look at this, feel free to contact me and I'll email it. I've tried making a dummy file to post here instead of the actual one, but nothing I've generated so far has been able to cause this error. Also note that the file originally had 220k+ lines; I'm able to reduce the error with a subset of this file that's 62k lines or more, but once I start making files smaller than that, the error doesn't occur. This is independent of which portion of the records I use to create the smaller files. to_featurelayer_bug_demo.txt

Expected behavior I expect to_featurelayer() to create my feature layer without throwing an error.

Platform (please complete the following information):

Additional context This error started occurring after Python API version 2.0.1. (Until I recently upgraded to ArcGIS Pro 3.3.1, I was specifically installing Python API version 2.0.1 instead of more recent versions because it would successfully run my script without throwing this error.)

knoopum commented 1 month ago

The underlying issue here may be that to_featurelayer is not cleaning up after itself as expected. It is leaving its temporary geodatabase behind on the local file system in a locked state.

The expectation is that if one uses to_featurelayer, and then deletes the published feature layer (and related geodatabase) from ArcGIS Online, then one should be able to run to_featurelayer with the same parameters again.

If one tries, then you get an error message:

FileExistsError: [WinError 183] Cannot create a file when that file already exists...

If the second attempt to publish occurs in the same script/notebook session, then the presence of the lock files in the geodatabase prevent one from manually deleting the temporary geodatabase as a workaround to continue.

The code below is another way to reproduce this issue. It was run in the standalone Jupyter notebook environment that comes with ArcGIS Pro 3.3.2, and includes version 2.3.0 of the ArcGIS Python API.

import arcgis
from arcgis import GIS
import arcpy

# ArcGIS API for Python version:
print('ArcGIS API for Python version:', arcgis.__version__)

# Connect to default GIS.
gis = GIS("home")  

# Prepare some test data as a dictionary for a feature set
d = {
    'features': [
        {
            'geometry': {
                'x': -4243783.810391866,
                'y': 4640950.609361529,
                'spatialReference': {
                    'wkid': 102100,
                    'latestWkid': 3857
                }
            },
            'attributes': {
                'OBJECTID': 1, 'Eight_Character': 'test'
            }
        }
    ],
    'objectIdFieldName': 'OBJECTID',
    'globalIdFieldName': '',
    'spatialReference': {
        'wkid': 102100, 
        'latestWkid': 3857
    },
    'geometryType': 'esriGeometryPoint',
    'fields': [
        {
            'name': 'OBJECTID',
            'type': 'esriFieldTypeOID',
            'alias': 'OBJECTID',
            'sqlType': 'sqlTypeOther',
            'domain': None,
            'defaultValue': None
        },
        {
            'name': 'Eight_Character',
            'type': 'esriFieldTypeString',
            'alias': 'Eight_Character',
            'sqlType': 'sqlTypeOther',
            'length': 8,
            'domain': None,
            'defaultValue': None
        }
    ]
}

# Convert dictionary to feature set
fs = arcgis.features.FeatureSet.from_dict(d)

# Convert feature set to spatial data frame
sdf = fs.sdf

# Publish feature set as spatial data frame to feature layer.
print("\nPublishing...")
item = sdf.spatial.to_featurelayer(
    title = 'Bug to_featurelayer not cleaning up',
    gis = gis,
    service_name = 'Bug_to_featurelayer_not_cleaning_up'
)

# Delete published feature layer and file geodatabase
print("\nDeleting...")
fgdb_item = item.related_items(
    rel_type = 'Service2Data'
)[0]
print("Delete item:", item.delete(permanent = True))
print("Delete item:", fgdb_item.delete(permanent = True))

# Once again, publish feature set as spatial data frame to feature layer.
print("\nPublishing again...")
item = sdf.spatial.to_featurelayer(
    title = 'Bug to_featurelayer not cleaning up',
    gis = gis,
    service_name = 'Bug_to_featurelayer_not_cleaning_up'
)

The resulting output in the notebook:


Publishing...

Deleting...
Delete item: True
Delete item: True

Publishing again...
---------------------------------------------------------------------------
FileExistsError                           Traceback (most recent call last)
~\AppData\Local\ESRI\conda\envs\work\Lib\site-packages\arcgis\gis\_impl\_content_manager\_import_data.py in _create_file_item(gis, df, file_type, **kwargs)
     84             # set up temporary zip to be used in directory
---> 85             os.makedirs(temp_dir)
     86             temp_zip = os.path.join(temp_dir, "%s.zip" % ("a" + uuid4().hex[:5]))

~\AppData\Local\ESRI\conda\envs\work\Lib\os.py in makedirs(name, mode, exist_ok)

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\knoop\\AppData\\Local\\Temp\\2\\Bug_to_featurelayer_not_cleaning_up'

During handling of the above exception, another exception occurred:

UnboundLocalError                         Traceback (most recent call last)
~\AppData\Local\Temp\2\ipykernel_95840\262642015.py in <cell line: 0>()
     78 # Once again, publish feature set as spatial data frame to feature layer.
     79 print("\nPublishing again...")
---> 80 item = sdf.spatial.to_featurelayer(
     81     title = 'Bug to_featurelayer not cleaning up',
     82     gis = gis,

~\AppData\Local\ESRI\conda\envs\work\Lib\site-packages\arcgis\features\geo\_accessor.py in to_featurelayer(self, title, gis, tags, folder, sanitize_columns, service_name, **kwargs)
   2910                 )
   2911 
-> 2912         result = content.import_data(
   2913             self._data,
   2914             folder=folder,

~\AppData\Local\ESRI\conda\envs\work\Lib\site-packages\arcgis\gis\__init__.py in import_data(self, df, address_fields, folder, item_id, **kwargs)
   8589         if _is_geoenabled(df) or (overwrite or insert):
   8590             # Item Workflow
-> 8591             return _cm_helper.import_as_item(self._gis, df, **kwargs)
   8592         else:
   8593             # Feature Collection Workflow

~\AppData\Local\ESRI\conda\envs\work\Lib\site-packages\arcgis\gis\_impl\_content_manager\_import_data.py in import_as_item(gis, df, **kwargs)
    245 
    246     # Create the file item, new item published from the file item, and the publish parameters
--> 247     file_item, new_item = _create_file_item(gis, df, file_type, **kwargs)
    248 
    249     # If not overwrite or insert, return the new item

~\AppData\Local\ESRI\conda\envs\work\Lib\site-packages\arcgis\gis\_impl\_content_manager\_import_data.py in _create_file_item(gis, df, file_type, **kwargs)
    153             shutil.rmtree(temp_dir, ignore_errors=True)
    154 
--> 155         if os.path.exists(file):
    156             os.remove(file)
    157     return file_item, new_item

UnboundLocalError: cannot access local variable 'file' where it is not associated with a value