R-ArcGIS / r-bridge

Bridge library to connect ArcGIS and R, including arcgisbinding R library.
Apache License 2.0
119 stars 26 forks source link

Can't run python code in R from Quarto when arcgisbinding is loaded #85

Closed Nova-Scotia closed 1 year ago

Nova-Scotia commented 1 year ago

I need to pull data from a web service using an API written for R, convert it to a fGDB, and use a python script to "overwrite" the data currently sitting in AGOL. So far I can get everything to work if I do it piece by piece, but trying to load arcgisbinding to create my fGDB in R, then switching to python code in another chunk, is not working.

Perhaps arcgisbinding and reticulate are using different versions of python? I don't have experience with environments/python - I've been trying to troubleshoot this but finding it difficult to pick out what solution might work in my case.

Running on Windows 10 Enterprise (with admin privs) RStudio "Desert Sunflower" 2023.09.0 Build 463
64 bit R 4.2.3
arcgisbinding 1.0.1.305

This works - everything runs as expected (including the rest of the python code, which is not necessary for this minimal - hopefully reproducible - example). Running in a Quarto .qmd document.

---
title: "Which python version?"
format: html
editor: visual
---

```{r libraries}
library(reticulate)
use_python('C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\python.exe', required = T)
```

```{python}
import arcpy
```

But if I include arcgisbinding I get this error message after running my first line of python code:

---
title: "Which python version?"
format: html
editor: visual
---

```{r libraries}
library(reticulate)
use_python('C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\python.exe', required = T)
library(arcgisbinding)
arc.check_portal()
arc.check_product()
```

```{python}
import arcpy
```

ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.9 from "C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe"
  * The NumPy version is: "1.20.1"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: DLL load failed while importing _multiarray_umath: The specified module could not be found.

If I try not specifying the python location, this is the error I get:

---
title: "Which python version?"
format: html
editor: visual
---

```{r libraries}
library(reticulate)
library(arcgisbinding)
arc.check_portal()
arc.check_product()
```

```{python}
import arcpy
```

> reticulate::repl_python()
Error: C:/Users/NewtonEr/AppData/Local/r-miniconda/envs/r-reticulate/python38.dll - The specified module could not be found.
JosiahParry commented 1 year ago

@Nova-Scotia thanks for making this issue. I genuinely have no idea why the order of operations matters here. That's something we ought to dig into. BUT things do work if you load arcgisbinding after you make your first call to python via reticulate.

Here's an example where it works. Note that I use use_condaenv() here and I install Pro into a non-standard location.

---
title: "Which python version?"
format: html
editor: visual
---

```{r libraries}
library(reticulate)
use_condaenv('C:\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3', required = TRUE)
import arcpy
import numpy as np
np.array([0,1])
library(arcgisbinding)
arc.check_portal()
arc.check_product()
np.array([1,0, 99])


Out of curiosity, would you be able to share a minimal reproducible example of the workflow that you're attempting? We may be able to accomplish this solely in R with the in development `{arcgislayers}` https://github.com/R-ArcGIS/arcgislayers which has functionality for writing to a FeatureService. A "real world" example would be helpful for a vignette or ensuring that the API is designed to fit the need effectively. 
JosiahParry commented 1 year ago

@scdub, do you happen to know why the interaction of arcgisbinding + Pro would break numpy in the conda env? Does arcgisbinding do some sort of locking / manipulation of the python environment at any point? I would assume not since it's written in C++

Nova-Scotia commented 1 year ago

@JosiahParry never have sweeter words been spoken. I've been banging my head against the wall experimenting with overwrites, truncates and appends for the last week so I would love it if your new package did that for me without having to use a python script. I'm not sure how to share a minimal reproducible example, since you'd only be able to overwrite a feature service that you own (or administer). If there is a way around that, I'm very happy to help out!

Ultimately, I'm using a slightly modified version of Esri's Truncate and Append script because I need multiple users to be able to update my feature class, not just the owner. ESRI told me that using their overwrite capability would only be possible for the feature service owner.

My ideal data workflow (that would work for myself and anyone else who is a feature layer administrator):

I tried your modified version of code. I can get further - I'm still using the same python location, unless you think I should change it (how? why?).

Now my code works until here:

>>> import arcpy
>>> import os, time, uuid
>>> from zipfile import ZipFile
>>> from arcgis.gis import GIS
>>> import arcgis.features

AttributeError: partially initialized module 'arcgis' has no attribute 'gis' (most likely due to a circular import)

Note - I connect to my account using gis = GIS("Pro") later in the script (I'm on a SAML/Enterprise account so can't easily use a username/password).

JosiahParry commented 1 year ago

Do you happen to have a file or folder with the name arcgis in your working directory? I cannot reproduce the error. Could the error be related to this SO question

```{r libraries}
library(reticulate)
use_condaenv('C:\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3', required = TRUE)
import arcpy
import numpy as np
np.array([0,1])
library(arcgisbinding)
arc.check_portal()
arc.check_product()
import os, time, uuid

from zipfile import ZipFile
from arcgis.gis import GIS
import arcgis.features

gis = GIS("Pro")
Nova-Scotia commented 1 year ago

I don't. But I wonder if it has something to do with my enterprise account...

JosiahParry commented 1 year ago

I don't think so. I think this is just a run of the mill python error :wink:

AttributeError: partially initialized module 'arcgis' has no attribute 'gis' (most likely due to a circular import)

Tells me that there's an error in the from arcgis.gis import GIS statement. Can you run this as a standalone python script from the same working directory?

Nova-Scotia commented 1 year ago

Yes - if I run in R:

library(reticulate)
use_python('C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\python.exe', required = T)

Then run a python script containing:

import arcpy
import os, time, uuid
from zipfile import ZipFile
from arcgis.gis import GIS
import arcgis.features

I have no issues.

Nova-Scotia commented 1 year ago

I keep wondering if I should stop banging my head against the wall and try arcgislayers ahaha...

JosiahParry commented 1 year ago

@Nova-Scotia here's a gist that walks through uploading an sf object as a feature service. It also updates fields and demonstrates how to delete every feature and add features again to the same feature service. There is an existing limitation, though, that for publish_layer() the CRS can only be 3857 right now. This is a bug I need to work out. However, if the feature service already exists and the CRS has been set you can use that one. The issue is in creating one with a different CRS. Feel free to start a discussion in arcgislayers https://github.com/R-ArcGIS/arcgislayers/discussions or email me directly jparry at esri dot com.

Regarding this python issue, I don't know if I can help much further without having a quarto file i can try and reproduce the issue with :(

Nova-Scotia commented 1 year ago

Good morning @JosiahParry , thanks for that, I'll give it a try.

I did finally get the python code working but it is extremely finicky. I have to load all my python libraries and connect to Pro before doing anything with arcgisbinding, then do my work in R, then call the libraries and connection back in again and run the python code. It's not pretty, but it works. I don't like it, though. I'm going to use your example and try to get that working but I guess I can fall back on this if I need to. For future reference, the below does not throw errors and successfully updates my feature class, using a version of ESRI's truncate and append script.

  ---
  title: "Which python version?"
  format: html
  editor: source
  ---

  ```{r libraries}
  library(reticulate)
  use_python('C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\python.exe', required = T)
  ```

  ```{python}
  # You must load all your python libraries and connect to Pro here, before calling arcgisbinding, 
  # then load them again in your python code. Trust me. It took me DAYS to figure this out. 
  import arcpy
  import os, time, uuid
  from zipfile import ZipFile
  from arcgis.gis import GIS
  import arcgis.features

  # Create GIS object
  print("Connecting to AGOL")
  gis = GIS("Pro")
  ```

  ```{r}
  library(tidyverse)
  library(arcgisbinding)
  library(sf)
  arc.check_portal()
  arc.check_product()

  # Do stuff in R code like creating fGDBs 
  ```

  ```{python}
  import arcpy
  import os, time, uuid
  from zipfile import ZipFile
  from arcgis.gis import GIS
  import arcgis.features

  # Create GIS object
  print("Connecting to AGOL")
  gis = GIS("Pro")

  # Run the python code to update & append here...
  ```
scdub commented 1 year ago

@Nova-Scotia Nice work finding a workaround! In terms of the issue when the NumPy import happens, the conda environment needs to be initialized so that the DLLs which are part of conda are loaded, e.g. by using use_condaenv. In terms of the broader issues, I definitely would like to look into it, that is too many rough edges. I'm not clear we can make this easy in all cases because the stack is complicated when mixing Quarto, Pro via conda and the full R stack, but it should be possible to make it manageable.

Nova-Scotia commented 1 year ago

Agreed - smoother would be better 😎. When you suggest using use_condaenv, is there anything else I should add to that call? Hoping to have time to try this out later this week.

JosiahParry commented 11 months ago

@nginer316 tagging you here since you're running into this very specific issue. @scdub confirmed can repro when using arcgisbinding in conjunction

nginer316 commented 11 months ago

To follow up, this order of operations worked: From here, I was able to I was able to do a full workflow- data I/O, R functions, and call a GP tool

# load libs
library(reticulate)
library(sf)

# get set up to use ArcPy within R
use_python('C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe', required = T)
arcpy <- import("arcpy")
np = import("numpy")

# load R-ArcGIS lib
library(arcgisbinding)
arc.check_product()
scw commented 11 months ago

After arcgisbinding is imported, reticulate will no longer correctly call into the dependencies for Python packages. Conda packages depend on DLLs being loaded from specific well-known locations such as <env>\Library\bin, but after arcgisscripting is imported, this location is no longer respected in resolving DLL imports and the import to MKL (BLAS library backing NumPy) fails. It isn't clear which package is at fault for this issue, but I can reproduce it including in a plain R session. I would recommend always using the reticulate-then-arcgisbinding approach until we can identify a better long term solution.