DataONEorg / rdataone

R package for reading and writing data at DataONE data repositories
http://doi.org/10.5063/F1M61H5X
36 stars 19 forks source link

Less-than-helpful output of archive() on edge cases #236

Closed amoeba closed 2 years ago

amoeba commented 5 years ago

I was happily archiving some content today and ran into weird output from archive().

When I don't have access to archive due to using the wrong token (aka not public but not properly auth'd), curl gives me:

❯ curl -X PUT -H "Authorization: Bearer $TOKEN" "https://neutral-cat.nceas.ucsb.edu/metacat/d1/mn/v2/archive/869928d3-459c-4854-b8c3-ca16cfae3354"
<?xml version="1.0" encoding="UTF-8"?><error detailCode="1820" errorCode="401" name="NotAuthorized">
    <description>CHANGE_PERMISSION not allowed on 869928d3-459c-4854-b8c3-ca16cfae3354 for subject[s]: public; </description>
</error>

while archive() gives me:

> dataone::archive(mn, "869928d3-459c-4854-b8c3-ca16cfae3354")
NULL
Warning message:
In .local(x, ...) : Error archiving 869928d3-459c-4854-b8c3-ca16cfae3354

When authenticated and I try to archive an object that doesn't exist:

curl:

❯ curl -X PUT -H "Authorization: Bearer $TOKEN" "https://neutral-cat.nceas.ucsb.edu/metacat/d1/mn/v2/archive/869928d3-459c-4854-b8c3-ca16cfae3354d"
<?xml version="1.0" encoding="UTF-8"?><error detailCode="1800" errorCode="404" name="NotFound">
    <description>No system metadata could be found for given PID: 869928d3-459c-4854-b8c3-ca16cfae3354d</description>
</error>

archive():

> dataone::archive(mn, "869928d3-459c-4854-b8c3-ca16cfae3354d")
NULL
Warning message:
In .local(x, ...) : Error archiving 869928d3-459c-4854-b8c3-ca16cfae3354d

I'll put this on the backburner unless anyone else wants to tackle it before then.

session info ``` > devtools::session_info() ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.1 (2019-07-05) os macOS Mojave 10.14.5 system x86_64, darwin15.6.0 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Juneau date 2019-07-26 ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ! package * version date lib source arcticdatautils * 0.6.4 2019-07-27 [1] Github (nceas/arcticdatautils@b1a440e) assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0) base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.6.0) callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0) cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0) commonmark 1.7 2018-12-01 [1] CRAN (R 3.6.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) curl 4.0 2019-07-22 [1] CRAN (R 3.6.0) dataone * 2.1.2 2019-01-29 [1] CRAN (R 3.6.0) datapack 1.3.1 2017-08-29 [1] CRAN (R 3.6.0) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0) digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0) dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0) EML 2.0.0 2019-04-23 [1] CRAN (R 3.6.0) emld 0.2.0 2019-03-06 [1] CRAN (R 3.6.0) evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0) hash 2.2.6.1 2019-03-04 [1] CRAN (R 3.6.0) htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0) httr 1.4.0 2018-12-11 [1] CRAN (R 3.6.0) jqr 1.1.0 2018-10-22 [1] CRAN (R 3.6.0) jsonld 2.1 2019-02-05 [1] CRAN (R 3.6.0) jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0) knitr 1.23 2019-05-18 [1] CRAN (R 3.6.0) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) parsedate 1.2.0 2019-05-08 [1] CRAN (R 3.6.0) pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0) pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0) pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) plyr 1.8.4 2016-06-08 [1] CRAN (R 3.6.0) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0) ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.0) R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0) V Rcpp 1.0.1 2019-07-25 [1] CRAN (R 3.6.0) redland 1.0.17-10 2018-07-20 [1] CRAN (R 3.6.0) remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0) rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0) rmarkdown 1.14 2019-07-12 [1] CRAN (R 3.6.0) roxygen2 6.1.1 2018-11-07 [1] CRAN (R 3.6.0) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) testthat 2.2.0 2019-07-22 [1] CRAN (R 3.6.0) tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0) usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0) uuid 0.1-2 2015-07-28 [1] CRAN (R 3.6.0) V8 2.3 2019-07-02 [1] CRAN (R 3.6.0) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) xfun 0.8 2019-06-25 [1] CRAN (R 3.6.0) XML 3.98-1.20 2019-06-06 [1] CRAN (R 3.6.0) xml2 1.2.0.9000 2019-07-24 [1] Github (HenrikBengtsson/xml2@618bc74) yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0) [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library V ── Loaded and on-disk version mismatch. ```
amoeba commented 5 years ago

So I narrowed this down to two things (only one is the main cause).

First, archive() doesn't even do anything with the response:

https://github.com/DataONEorg/rdataone/blob/e91e1fc2b8609a4c8059e4b2712d8935ac895e22/R/D1Node.R#L181-L183

But this is probably because the response is empty. archive() calls auth_put which calls auth_put_post_delete like:

https://github.com/DataONEorg/rdataone/blob/e91e1fc2b8609a4c8059e4b2712d8935ac895e22/R/auth_request.R#L163

If I remove the body=body part of that call, I get the response body and can get a full error response.

@gothub, do you remember anything tricky with this? I think the easiest thing to do is just to add a body="" to the call like:

response <- auth_put(url, node=x, body = "")

which makes it work:

> archive(mn, pid)
NULL
Warning message:
In .local(x, ...) :
  Error archiving urn:uuid:8727d227-3999-404a-a4e0-165ceaeb794a
CHANGE_PERMISSION not allowed on urn:uuid:8727d227-3999-404a-a4e0-165ceaeb794a for subject[s]: public; 
amoeba commented 5 years ago

Thought I'd just check and see what actually uses auth_put: There a four call sites,

response <- auth_put(
  url, 
  encode="multipart", 
  body=list(pid=pid, sysmeta=upload_file(sm_file, type='text/xml')), 
  node=x)
response <- auth_put(url, node=x)

So it's safe to either change the default behavior of auth_put from defaulting to body=as.list(NA) to body=NULL since all of the other call sites specify their body argument. body="" is the same as not specifying a body argument in the first place.

I did this and also used the same pattern for auth_post and auth_delete. All tests pass with the change. See diff:

diff --git a/R/D1Node.R b/R/D1Node.R
index db5917c..dd77146 100644
--- a/R/D1Node.R
+++ b/R/D1Node.R
@@ -177,9 +177,9 @@ setGeneric("archive", function(x, ...) {
 #' @rdname archive
 setMethod("archive", signature("D1Node"), function(x, pid) {
     url <- paste(x@endpoint, "archive", URLencode(pid, reserved=TRUE), sep="/")
-    response <- auth_put(url, node=x)
+    response <- auth_put(url, node=x, body = NULL)
     if(response$status_code != "200") {
-        warning(sprintf("Error archiving %s\n", pid))
+        warning(sprintf("Error archiving %s\n", pid), getErrorDescription(response))
         return(NULL)
     } else {
         # Comment out body handling because httr::PUT is not returning a response body at all
diff --git a/R/auth_request.R b/R/auth_request.R
index 164ceb5..9474049 100644
--- a/R/auth_request.R
+++ b/R/auth_request.R
@@ -144,13 +144,10 @@ auth_head <- function(url, nconfig=config(), node) {
 #' @param node The D1Node object that the request will be made to.
 #' @return the response object from the method
 #' @import httr
-auth_put_post_delete <- function(method, url, encode="multipart", body=as.list(NA), node) {
+auth_put_post_delete <- function(method, url, encode="multipart", body=NULL, node) {

   am <- AuthenticationManager()
   if(!missing(node) && isAuthValid(am, node)) {
-    if (is.na(body[1])) {
-      body = FALSE
-    }
     if(getAuthMethod(am, node) == "token") {
       # Authentication will use an authentication token.
       authToken <- getToken(am, node)
@@ -205,7 +202,7 @@ auth_put_post_delete <- function(method, url, encode="multipart", body=as.list(N
 #' @param node The D1Node object that the request will be made to.
 #' @return the HTTP response from the request
 #' @import httr
-auth_post <- function(url, encode="multipart", body=as.list(NA), node) {
+auth_post <- function(url, encode="multipart", body=NULL, node) {
     response <- auth_put_post_delete("post", url, encode, body, node)
     return(response)
 }
@@ -220,7 +217,7 @@ auth_post <- function(url, encode="multipart", body=as.list(NA), node) {
 #' @param node The D1Node object that the request will be made to.
 #' @return the HTTP response from the request
 #' @import httr
-auth_put <- function(url, encode="multipart", body=as.list(NA), node) {
+auth_put <- function(url, encode="multipart", body=NULL, node) {
     response <- auth_put_post_delete("put", url, encode, body, node)
     return(response)
 }
@@ -235,7 +232,7 @@ auth_put <- function(url, encode="multipart", body=as.list(NA), node) {
 #' @param node The D1Node object that the request will be made to.
 #' @return the HTTP response from the request
 #' @import httr
-auth_delete <- function(url, encode="multipart", body=as.list(NA), node) {
+auth_delete <- function(url, encode="multipart", body=NULL, node) {
     response <- auth_put_post_delete("delete", url, encode, body, node)
     return(response)
 }

@gothub what do you think?

jeanetteclark commented 2 years ago

@amoeba can you merge this in if you have it around? it sounds good to me

amoeba commented 2 years ago

Done, and added a test to boot. Thanks for the poke.