IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
60 stars 24 forks source link

CRAN checks failing at vignette when network resource unavailable #131

Closed kuriwaki closed 2 months ago

kuriwaki commented 9 months ago

Needs to be fixed by 2023-10-23 to prevent suspension from CRAN.

Detailed log currently at https://cran.r-project.org/web/checks/check_results_dataverse.html
using R Under development (unstable) (2023-10-04 r85267)
using platform: x86_64-pc-linux-gnu
R was compiled by
    Debian clang version 16.0.6 (15)
    GNU Fortran (Debian 13.2.0-4) 13.2.0
running under: Debian GNU/Linux trixie/sid
using session charset: UTF-8
checking for file ‘dataverse/DESCRIPTION’ ... OK
this is package ‘dataverse’ version ‘0.3.13’
package encoding: UTF-8
checking package namespace information ... OK
checking package dependencies ... OK
checking if this is a source package ... OK
checking if there is a namespace ... OK
checking for executable files ... OK
checking for hidden files and directories ... OK
checking for portable file names ... OK
checking for sufficient/correct file permissions ... OK
checking serialization versions ... OK
checking whether package ‘dataverse’ can be installed ... OK
See the [install log](https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/dataverse-00install.html) for details.
checking package directory ... OK
checking for future file timestamps ... OK
checking ‘build’ directory ... OK
checking DESCRIPTION meta-information ... OK
checking top-level files ... OK
checking for left-over files ... OK
checking index information ... OK
checking package subdirectories ... OK
checking R files for non-ASCII characters ... OK
checking R files for syntax errors ... OK
checking whether the package can be loaded ... [0s/1s] OK
checking whether the package can be loaded with stated dependencies ... [0s/0s] OK
checking whether the package can be unloaded cleanly ... [0s/0s] OK
checking whether the namespace can be loaded with stated dependencies ... [0s/0s] OK
checking whether the namespace can be unloaded cleanly ... [0s/1s] OK
checking loading without being on the library search path ... [0s/0s] OK
checking whether startup messages can be suppressed ... [0s/1s] OK
checking use of S3 registration ... OK
checking dependencies in R code ... OK
checking S3 generic/method consistency ... OK
checking replacement functions ... OK
checking foreign function calls ... OK
checking R code for possible problems ... [10s/12s] OK
checking Rd files ... [1s/1s] OK
checking Rd metadata ... OK
checking Rd line widths ... OK
checking Rd cross-references ... OK
checking for missing documentation entries ... OK
checking for code/documentation mismatches ... OK
checking Rd \usage sections ... OK
checking Rd contents ... OK
checking for unstated dependencies in examples ... OK
checking installed files from ‘inst/doc’ ... OK
checking files in ‘vignettes’ ... OK
checking examples ... [1s/1s] OK
checking for unstated dependencies in ‘tests’ ... OK
checking tests ... [5s/6s] OK
  Running ‘testthat.R’ [5s/6s]
checking for unstated dependencies in vignettes ... OK
checking package vignettes in ‘inst/doc’ ... OK
checking re-building of vignette outputs ... [6s/8s] ERROR
Error(s) in re-building vignettes:
  ...
--- re-building ‘A-introduction.Rmd’ using rmarkdown
--- finished re-building ‘A-introduction.Rmd’

--- re-building ‘B-search.Rmd’ using rmarkdown

Quitting from lines 21-24 [unnamed-chunk-1] (B-search.Rmd)
Error: processing vignette 'B-search.Rmd' failed with diagnostics:
Service Unavailable (HTTP 503).
--- failed re-building ‘B-search.Rmd’

--- re-building ‘C-download.Rmd’ using rmarkdown

Quitting from lines 46-50 [unnamed-chunk-3] (C-download.Rmd)
Error: processing vignette 'C-download.Rmd' failed with diagnostics:
Service Unavailable (HTTP 503).
--- failed re-building ‘C-download.Rmd’

SUMMARY: processing the following files failed:
  ‘B-search.Rmd’ ‘C-download.Rmd’

Error: Vignette re-building failed.
Execution halted
checking PDF version of manual ... [7s/9s] OK
checking HTML version of manual ... [2s/4s] OK
checking for non-standard things in the check directory ... OK
DONE
Status: 1 ERROR
wibeasley commented 9 months ago

@kuriwaki, tell me if you want to talk out anything related to this.

pdurbin commented 9 months ago

Same.

kuriwaki commented 9 months ago

The checks are now working, so it appears this was triggered by the temporary maintenance / upgrade on the server side. The CRAN administrators pointed out the CRAN rules that the package should not rely on network resources to run successfully. I think on the test side we are ok, but the vignettes were reliant.

image
kuriwaki commented 2 months ago

CRAN check hit an error again due to a remote resource failing so this needs to be implemented and approved by May 12 (per email by CRAN maintainers to me today).

Quitting from lines  at lines 21-24 [unnamed-chunk-1] (B-search.Rmd)
Error: processing vignette 'B-search.Rmd' failed with diagnostics:
Internal Server Error (HTTP 500). Failed to Exception running search for [Gary King] with filterQueries [dvObjectType:(dataverses OR datasets OR files), ] and paginationStart [0]: edu.harvard.iq.dataverse.search.SearchException: Internal Dataverse Search Engine Error org.apache.solr.client.solrj.SolrServerException org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://dvn-cloud-solr.lib.harvard.edu:8983/solr/collection1 java.net.SocketTimeoutException java.net.SocketTimeoutException: Read timed out .
--- failed re-building ‘B-search.Rmd’
pdurbin commented 2 months ago

I see, the test is searching Harvard Dataverse:

https://github.com/IQSS/dataverse-client-r/blob/v0.3.11/vignettes/B-search.Rmd#L22

The approach pyDataverse recently adopted is to spin up a instance of Dataverse in containers using GitHub Actions. If this is of interest, I'm sure @JR-1991 or I would be happy to walk you through https://github.com/gdcc/pyDataverse/blob/master/.github/workflows/test_build.yml

It could even be a topic for a future pyDataverse meeting: https://py.gdcc.io

Or a future container meeting: https://ct.gdcc.io

kuriwaki commented 2 months ago

@pdurbin @JR-1991 I'd like to explore this. I just started by pasting your script in as a workflow in our dev branch: https://github.com/IQSS/dataverse-client-r/actions/workflows/test_build.yml

kuriwaki commented 2 months ago

It looks like this workflow ^ is for creating a new dataverse instance to upload to, using https://github.com/gdcc/dataverse-action. In this R-client test, I only need to search and download existing datasets on dataverse.harvard.edu. I will try to make a ghactions workflow for that. @JR-1991 would you still recommend using one of your templates?

JR-1991 commented 2 months ago

@kuriwaki — Yes, the action spins up a local Dataverse instance available to the action's runner. In terms of consistency, I would recommend using the Dataverse Action instead of querying dataverse.harvard.edu. Partly because connections could be unavailable, as in this issue, but also because you can extend your range of tests.

For instance, you can create a collection and corresponding datasets using pyDataverse or cURL after setting up the Dataverse instance. This way, you can manage various cases (e.g., includes tabular data, zips, etc) and, in general, have more control over the tests themselves. I personally prefer it over using Demo or Harvard Dataverse, because it eliminates a layer of dependency.

I am happy to support you in setting up everything. PyDataverse already includes some functions we could employ in your Action to create datasets and collections.

kuriwaki commented 2 months ago

Thanks @JR-1991, that sounds cool. For now, to resolve the immediate issue, I have done a simpler thing, which is to take out the live code away from vignettes (per CRAN policy) and putting them instead in tests/*ghaction.R. I then added a chunk in ghactions to run those separately.

But when I close this, I will make a new issue to propose your pyDataverse solution.

      - name: Test live dataverse in vignettes
        run: |
          devtools::load_all()
          source("tests/B-search_ghaction.R")
          source("tests/C-download_ghaction.R")
        shell: Rscript {0}