Bioconductor / AnnotationHub

Client for the Bioconductor AnnotationHub web resource
15 stars 12 forks source link

permission denied on lock file when using AnnotationHub cache #31

Closed daler closed 2 years ago

daler commented 2 years ago

We typically set the AnnotationHub cache to be local to a project. This lets any user with access to the directory run the code and use the same cache. This was definitely working in 2.22.0.

In 3.2.0 (but possibly earlier versions, I haven't looked closely), this is no longer possible. I think it's due to a lockfile remaining around.

The quick test to reproduce the issue is this:

mkdir -p cache

# as user1
Rscript -e "AnnotationHub::AnnotationHub(cache='cache')"

# switch to user2, who is in the same group as user 1, and run the same thing
Rscript -e "AnnotationHub::AnnotationHub(cache='cache')"

The latter command gives the error:

Error in lock(.sql_lock_path(dbfile), exclusive = FALSE) :
  Cannot open lock file: Permission denied
Calls: <Anonymous> ... tryCatch -> tryCatchList -> .sql_connect_RO -> lock
Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'conn' in selecting a method for function 'dbDisconnect': object 'info' not found
Calls: <Anonymous> ... .sql_disconnect -> dbDisconnect -> .handleSimpleError -> h
Execution halted

Inspecting the cache directory, I see:

-rw------- 1 user1 staff    0 Jan 16 16:32 BiocFileCache.sqlite.LOCK
-rw-rw---- 1 user1 staff 105M Jan 16 16:32 10491703f35c_annotationhub.sqlite3
-rw-r----- 1 user1 staff  20K Jan 16 16:32 BiocFileCache.sqlite
-rw-rw---- 1 user1 staff 1.7M Jan 16 16:32 104953737007_104953737007_hub_index.rds

So the lock file is still around and it's not group-readable which is likely giving the error.

I confirmed that in 2.22.0 the lock file does not remain and that the above test works without errors.

Any ideas on how to resolve this? Perhaps a temporary fix is to wrap AnnotationHub::AnnotationHub() in a new function that cleans up the lockfile before returning the ah.

I was unable to pin down where exactly this was happening, and pinging @lshep since this might be in BiocFileCache instead.

lshep commented 2 years ago

@daler Can you provide the sessionInfo() too so we know the version of BiocFileCache being utilized in the backend.

daler commented 2 years ago

Sure, here it is:

R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS/LAPACK: /opt/lib/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] KEGGREST_1.34.0               tidyselect_1.1.1
 [3] BiocVersion_3.14.0            purrr_0.3.4
 [5] vctrs_0.3.8                   generics_0.1.1
 [7] htmltools_0.5.2               stats4_4.1.1
 [9] BiocFileCache_2.2.0           yaml_2.2.1
[11] utf8_1.2.2                    interactiveDisplayBase_1.32.0
[13] blob_1.2.2                    rlang_0.4.12
[15] pillar_1.6.4                  later_1.2.0
[17] withr_2.4.2                   glue_1.5.0
[19] DBI_1.1.1                     rappdirs_0.3.3
[21] BiocGenerics_0.40.0           bit64_4.0.5
[23] dbplyr_2.1.1                  GenomeInfoDbData_1.2.7
[25] lifecycle_1.0.1               zlibbioc_1.40.0
[27] Biostrings_2.62.0             memoise_2.0.0
[29] Biobase_2.54.0                IRanges_2.28.0
[31] fastmap_1.1.0                 httpuv_1.6.3
[33] GenomeInfoDb_1.30.0           curl_4.3.2
[35] fansi_0.4.2                   AnnotationDbi_1.56.1
[37] Rcpp_1.0.7                    xtable_1.8-4
[39] promises_1.2.0.1              filelock_1.0.2
[41] BiocManager_1.30.16           cachem_1.0.6
[43] S4Vectors_0.32.0              XVector_0.34.0
[45] mime_0.12                     bit_4.0.4
[47] AnnotationHub_3.2.0           png_0.1-7
[49] digest_0.6.28                 dplyr_1.0.7
[51] shiny_1.7.1                   tools_4.1.1
[53] bitops_1.0-7                  magrittr_2.0.1
[55] RCurl_1.98-1.5                tibble_3.1.6
[57] RSQLite_2.2.8                 crayon_1.4.2
[59] pkgconfig_2.0.3               ellipsis_0.3.2
[61] assertthat_0.2.1              httr_1.4.2
[63] R6_2.5.1                      compiler_4.1.1
lshep commented 2 years ago

@daler Could you also try manually changing the permissions of the LOCK file to rw for the group to see if that allows access?

daler commented 2 years ago

Yes, I confirmed that changing permissions does allow access to user2. However the database itself (BiocFileCache.sqlite) is still read-only for group so I get an error when trying to access a resource.

E.g., as user2 I'm able to create an AnnotationHub object after fixing the LOCK file permissions but when trying to get a resource:

> ah[['AH5012']]
downloading 1 resources
retrieving 1 resource
Error: failed to load resource
  name: AH5012
  title: Chromosome Band
  reason: 1 resources failed to download
In addition: Warning message:
download failed
  hub path: ‘https://annotationhub.bioconductor.org/fetch/5012’
  cache resource: ‘AH5012 : 5012’
  reason: attempt to write a readonly database

So if I also change the permissions of the database itself, then everything works as expected for both users.

lshep commented 2 years ago

@daler I'm hesitant to apply the fix (in BiocFileCache) to happen automatically for both the LOCK and sqlite file. Reasoning: different organizations/institutions may not want all users of a group to be able to download resources at will and may want the more restrictive access of read-only. Do you think automatically changing the permissions of the LOCK file in BiocFileCache to allow read access to the sqlite database and adding a section in the AnnotationHub/BiocFileCache vignettes that explains updating the permissions manually for sqlite file to allow full group access for read-write is sufficient?

daler commented 2 years ago

Sure, that makes sense and I think that would be fine. Would you consider exposing permission configuration via AnnotationHub arguments? Regardless, a note in the vignette[s] would be helpful.

I guess I'm not clear on the LOCK file though -- in other contexts, lock files mean that someone else is using the resource but then it is cleaned up when they're done using it. In this instance, should user1's .sqlite.LOCK file remain even after they are done?

lshep commented 2 years ago

we use the filelock package to control the locking and based on our experience the file remains after unlocking and access controlled by who is locking at any given time.

daler commented 2 years ago

OK, reading up on the lockfile package it looks to be using fcntl on the special lock file, so it's not the presence/absence of the file itself that is determining the lock. Changing the permissions to be 660 should still allow locks to work.

The question is though whether the permissions 660 should always be applied. As you point out, different users will have different needs. The current 600 permissions are probably most broadly applicable. So perhaps the most general solution to my original issue would be to fix the file permissions of both the lock and the database myself, and make no changes to BiocFileCache.

lshep commented 2 years ago

It might be best for that. I can add the instructions for this case in the vignettes so its more readily available and known.