Sage-Bionetworks / synapser

An R package providing programmatic access to Synapse
Apache License 2.0
32 stars 21 forks source link

synGet() hangs when retrieving files as non-root (macOS 10.15) #298

Closed afaranda closed 3 years ago

afaranda commented 3 years ago

Operating system

macOS Catalina 10.15.7 R version 4.0.3

Description of the problem

I'm unable to retrieve files using synGet() without errors. When I use the default path, as in below:

> library('synapser')
> synapser::synLogin(
+   "my_user_name",
+   "top_secret_password"
+ )

> fh <- synGet("syn1906479", downloadFile=T)`
   Error in value[[3L]](cond) : `
  [Errno 13] Permission denied: '/Users/myusername/.synapseCache/969'`
> 

When I change the path, the file is successfully retrieved (and has the correct md5) however the command gets "stuck" and never completes.

> fh <- synGet("syn1906479", downloadFile=T, downloadLocation="/Users/myusername/Desktop")

Deleting the file does not end the command and return the console prompt

If I delete the file, and then send a single interrupt (either "ctrl-c" in the terminal, or the "stop" button in RStudio), the file is re-downloaded, but R remains stuck on the call to synGet().

Sending several interrupts in rapid succession kills the call to synGet() and returns the following error

In RStudio:

> fh <- synGet("syn1906479", downloadFile=T, downloadLocation="/Users/adam/Desktop")

Error in value[[3L]](cond) : keyboard interrupt received
>

In a terminal R Session

> fh <- synGet("syn1906479", downloadFile=T, downloadLocation="/Users/adam/Desktop")
^C
^C
^C
^C
^C
Error in value[[3L]](cond) : keyboard interrupt received

I also observed a different error that I could not reproduce

> fh <- synGet("syn1906479", downloadFile=T, downloadLocation="/Users/adam/Desktop")
^C
^C
^C
Error in value[[3L]](cond) : 
  Could not obtain a lock on the file cache within timeout: 0:01:10  Please try again later
> 

Provide a description of the problem, and if possible a minimal reproducible example.

Expected behavior

I expected the file to be downloaded, and the file handle stored in an R object

Actual behavior

Command gets stuck and in some cases the target file fails to download

Output of sessionInfo()

 sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] synapser_0.9.77

loaded via a namespace (and not attached):
[1] compiler_4.0.3        R6_2.5.0              tools_4.0.3          
[4] codetools_0.2-18      pack_0.1-1            PythonEmbedInR_0.6.76

Copy the output of running sessionInfo() to get a listing of your R environment.

afaranda commented 3 years ago

Upgrading my R to 4.0.5 and upgrading synapser did not resolve, as a normal user I experienced the same problem as above.

Starting R using "sudo" appears to resolve the issue:

me@my-MacBook-Pro ~ % sudo R

>library(synapser)

TERMS OF USE NOTICE:
  When using Synapse, remember that the terms and conditions of use require that you:
  1) Attribute data contributors when discussing these data or results from these data.
  2) Not discriminate, identify, or recontact individuals or groups represented by the data.
  3) Use and contribute only data de-identified to HIPAA standards.
  4) Redistribute data only under these same terms of use.

> synLogin('afaranda', 'adnaraf7')
Welcome, afaranda!NULL
> fh

Error: object 'fh' not found
> fh <- synGet('syn1906479', downloadLocation="/Users/adam/Desktop")
Downloading  [####################]100.00%   1.5kB/1.5kB (1.6MB/s) response(1).txt Done...    > 

Session Info

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] synapser_0.10.89

loaded via a namespace (and not attached):
[1] compiler_4.0.5        R6_2.5.0              tools_4.0.5           codetools_0.2-18      pack_0.1-1            PythonEmbedInR_0.7.80
> 
afaranda commented 3 years ago

On a linux machine, everything works as a regular user

downloading to the cache directory:

> file <- synGet('syn1906479')
Downloading  [####################]100.00%   1.5kB/1.5kB (455.9kB/s) response.txt Done...    > 
> file
File: response.txt (syn1906479)
  md5=4122b9531cb40f3d40268a25ff5ee81c
  fileSize=1508
  contentType=text/plain
  externalURL=None
  cacheDir=/home/abf/.synapseCache/969/32969
  files=['response.txt']
  path=/home/abf/.synapseCache/969/32969/response.txt
  synapseStore=True
properties:
  concreteType=org.sagebionetworks.repo.model.FileEntity
  createdBy=273979
  createdOn=2013-06-05T20:01:44.143Z
  dataFileHandleId=32969
  etag=593ddb8c-7ab2-11e9-98fa-026b0a0ad230
  id=syn1906479
  isLatestVersion=True
  modifiedBy=273979
  modifiedOn=2013-06-05T20:10:18.657Z
  name=response.txt
  parentId=syn1901850
  versionLabel=2
  versionNumber=2
annotations:

And downloading to my home directory

> file <- synGet('syn1906479', downloadLocation="/home/abf")
> list.files(pattern="response")
[1] "response.txt"

Session Info


> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 31 (Server Edition)

Matrix products: default
BLAS:   /home/abf/bin/R/lib64/R/lib/libRblas.so
LAPACK: /home/abf/bin/R/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] synapser_0.10.89 R6_2.5.0        

loaded via a namespace (and not attached):
[1] BiocManager_1.30.12   compiler_4.0.5        tools_4.0.5          
[4] codetools_0.2-18      pack_0.1-1            PythonEmbedInR_0.7.80
jkiang13 commented 3 years ago

Hi @afaranda

synapser uses a cache directory which it automatically creates in the user's $HOME directory. It uses this cache to keep an index of the files it downloads and as the default path to download files to when a downloadLocation is not otherwise specified.

It seems that that path on your Mac is not writable. The hanging on a synGet with a downloadLocation is a byproduct of that error, although it should raise that error immediately rather than getting stuck in a long retry loop as it is currently doing, which we'll address as a separate bug.

Can you see if the user has permission to create a directory at the path returned by Sys.getenv('HOME'), or if there is a $HOME/.synapseCache directory already, whether your user has permission to read and create files within it? Typically a user would have permission to write to their own $HOME directory, but that may not be so in your case.

If you are unable or don't want to allow synapser to write to a $HOME/.synapseCache directory, some workarounds are doing one of the following:

  1. Create a symbolic link to another location where you will allow synapser to write to, e.g. mkdir -p <path_to_cache_dir>; ln -s <path_to_cache_dir> ~/.synapseCache
  2. Run the following command before synGet (this is a bit of a backdoor way to change the cache location) library(PythonEmbedInR); PythonEmbedInR::pyExec("syn=synapseclient.Synapse(skip_checks=True, cache_root_dir='<path_to_alternate_cache_dir>")

We recently added support within the Python Synapse client for customizing the cache directory without needing workarounds above, but do not currently have that exposed in synapser. We will add an issue for that for a future version.

afaranda commented 3 years ago

Thanks so much for helping me troubleshoot this, as it turns out I cannot create directories in the current synapse cache:

drwxr-xr-x    3 root  staff    96B Apr 28 20:03 .synapseCache

I suspect this is because I install R packages as root. I changed the owernship of the directory, but that did not work

user@users-MacBook-Pro ~ % ls -lah | grep synapse
drwxr-xr-x    3 user  staff    96B Apr 28 20:03 .synapseCache

I deleted the existing .synapseCache directory and created the symbolic link as you suggested

Shell:

rm -rf ~/.synapseCache
ln -s /Users/user/Desktop/alternate_synapse_cache ~/.synapseCache

R (as regular user)

library(PythonEmbedInR); 
PythonEmbedInR::pyExec("syn=synapseclient.Synapse(skip_checks=True,cache_root_dir='Users/user/Desktop/alternate_synapse_cache')")

The suggested process worked as expected.

I then deleted the symbolic link in my HOME directory, and replaced it with a regular folder. After making this change, synGet() is still working properly, even with "downloadLocation='~/Desktop'"

Also, I was wrong in my previous comment. After upgrading synGet() was still hanging, but it was not getting stuck in the retry loop. It was successfully killed after a single interrupt.

jkiang13 commented 3 years ago

Great. We'll address the hanging when this condition is encountered as a separate issue (while synapser needs to be able to read/write to its cache directory, it should error immediately rather than hang if it does not have permission to do so).