IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
61 stars 24 forks source link

Testing against a live Dataverse #40

Closed wibeasley closed 2 years ago

wibeasley commented 4 years ago

There needs to be a different approach to initiating the test suite. Right now [tests that should fail... still pass. It's because testthat::test_check() currently won't run if the API key isn't found as an environmental variable.

I'm open to ideas as always. Currently I'm thinking:

  1. Test only against demo.dataverse.org. (A few weeks ago @pdurbin advocated this in a phone call for several reasons, including that Dataverse's retrieval stats won't be misleading --because one article gets hundreds of hits a month just from automated tests.)

  2. create a (demo) Dataverse account dedicated to testing. At this point, I don't think it needs to be kept secret. There's not really a need to keep it secret. It could even be set in tests/testthat.R.

    @pdurbin, will you please check my claim --especially from a security standpoint?

  3. If the above is safe, the api key might be kept in a yaml file in the inst/ directory.

  4. If the API key to the demo server needs to be protected,

    1. we could save it as Travis environmental variables (ref 1 & ref 2)

    2. it would prevent other people from testing the packages on their own machine, so we'll get fewer quality contributions from others.


@skasberger, @rliebz, @tainguyenbui, and any others, I'd appreciate any advice from your experience with pyDataverse, dataverse-client-python, and dataverse-client-javascript. I'm not experienced with your languages, but looks like pyDataverse doesn't pass an API key, while client-python posts their API key to the demo server.


(This is different from #4 & #29, which involve the battery of tests/comparisons. Not the management of API keys or how testthat is initiated.)

pdurbin commented 4 years ago

@wibeasley my first thought is that you could create a one-off user for every run on the demo site like this:

curl -d @user-add.json -H "Content-type:application/json" "$SERVER_URL/api/builtin-users?password=$NEWUSER_PASSWORD&key=$BUILTIN_USERS_KEY"

That's from http://guides.dataverse.org/en/4.18.1/api/native-api.html#create-a-builtin-user

I just tested it on the demo server and it worked with these environment variables:

export SERVER_URL=https://demo.dataverse.org

export NEWUSER_PASSWORD=password1

export BUILTIN_USERS_KEY=burrito

You'd have to vary the JSON you send each time to avoid errors about non-unique usernames or email addresses. In the link above this is what we provide as an example:

{
  "firstName": "Lisa",
  "lastName": "Simpson",
  "userName": "lsimpson",
  "affiliation": "Springfield",
  "position": "Student",
  "email": "lsimpson@mailinator.com"
}
tainguyenbui commented 4 years ago

Is there any chance that the real environments are not hit? You could create mocks and just make sure that the http request is being made with the right parameters.

pdurbin commented 4 years ago

Oh and to be clear the JSON response you get back should include the API token, which you'd use for subsequent operations. Using jq you could grab it like this:

jq '.data.apiToken'

But I assume you'd want to implement all of this "create user and assign the API token to a variable" stuff in R.

wibeasley commented 4 years ago

During a meeting Friday, @pdurbin tentatively planned that

@tainguyenbui, right now I think the mocks might be overkill for the current goals. I do appreciate that it could isolate problems of the client from problems of the server software. But the server seems pretty stable, and might require less maintenance that whatever mock I develop. Tell me if you think I'm overlooking something important.

skasberger commented 4 years ago

Some thoughts from me (pyDataverse).

I think it would be great, to have a test-instance somewhere with the latest Dataverse version, so the clients can be tested there before (or after) the release. If there is one user for all, or one for each client, I don't know.

pyDataverse passes an API key if it is passed in the init of an Api() object.

pdurbin commented 4 years ago

Related is the idea of setting up a beta server that runs "develop" - https://github.com/IQSS/dataverse.harvard.edu/issues/20

Also, a new instance of Dataverse is spun up after every pull request is merged at https://jenkins.dataverse.org/job/IQSS-dataverse-develop/ but then it gets terminated a few hours later.

Finally, https://demo.dataverse.org always runs the latest release. It's the officially blessed server for testing releases: http://guides.dataverse.org/en/4.18.1/api/getting-started.html#servers-you-can-test-with

adam3smith commented 4 years ago

If I understand this correctly and the decision is made, could you publish the dataverse on demo? I don't like the idea of writing tests that will have to be re-written.

pdurbin commented 4 years ago

@adam3smith I assume your question is for @wibeasley

You both have my blessing to publish whatever you want on https://demo.dataverse.org , especially if you're testing dataverse-client-r! 😄 🎉

adam3smith commented 4 years ago

Yes, this is referring to https://demo.dataverse.org/dataverse/dataverse-client-r which is unpublished, sorry for the confusion.

kuriwaki commented 3 years ago

Are there (or can we make) datasets on demo.dataverse.org that are permanent and can be used for testing the data download functions?

The current get_file tests read from a now-inexistent DOI.

For testing, it would be good to have files that are Stata (.dta), SPSS (.sav), .csv, as well as some non-tabular data (like R script and PDFs).

kuriwaki commented 3 years ago

Update: I made one. Currently, it has a Stata dta, a csv, in a nested directory structure.

Kuriwaki, Shiro, 2020, "Example Dataverse for dataverse-client-r package", https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/PPKHI1, Demo Dataverse,UNF:6:GBfbc/ZU12xvVxHGRU4uMw== [fileUNF]

kuriwaki commented 3 years ago

Scratch that, I created the dataset in the dataverse-client-r dataverse mentioned in #65. We can put all test data in https://demo.dataverse.org/dataverse/dataverse-client-r, with a separate "dataset" for different topics. For example, I put my Stata test data within "National Longitudinal Study of Young Women - Example Dataset", in dataverse-client-r

skasberger commented 3 years ago

Some general thoughts from me how to move on:

  1. Work out a test strategy. What functionalities are critical, how to test them etc. Again, this should be same for each client.
  2. Define requirements: a) Create a gold standard for metadata and files. the idea behind is, to have 3-4 different datasets with files attached, which are representative and can be used for different testing purposes (unit, integrity). These resources then can be used by all clients. b) find out, which Dataverse versions have which endpoints available and to test this. c) which API response can be expected (status codes and data -> for exception handling).
  3. Implement the tests. For each client different!
  4. Setup Dataverse instances, where the needed test data is available and can be created/manipulated. develop and latest would be nice.

I have done some work for 1 + 2 (develop branch of pyDataverse), and will do a total overwhaul for all the points mentioned above in the next 2 months for a major release. Maybe a call to discuss different things mentioned would be a great starting point, so the resources created (e. g. metadata JSON) are usefull for all and can be shared. My strategy is to work with a Docker instance locally for development (with which I can switch easily from one Dataverse version to another) and work out full, minimal metadata and representative test data. I have done some parts for this already, but the AUSSDA test data is so far not public (GDPR checks missing). We will also setup a Dataverse instance for pyDataverse testing, but still, the testing of different Dataverse versions is an issue not easy to maintane. But maybe this does not need to be done regularly.

So, this is quite tricky and complex, and I am already thinking about this for a long time. So from my point of view, I think to have a call and talk about this together would be the most efficient way forward, as more brains reduce errors and amount of work. :) What do you think about that?

kuriwaki commented 2 years ago

So from my point of view, I think to have a call and talk about this together would be the most efficient way forward, as more brains reduce errors and amount of work. :)

@wibeasley @skasberger and I met on Jan 2021 to discuss this. The setup we have landed on for the current CRAN submissions seems stable to me: i.e. to use a demo dataverse and run daily checks on it through Github Actions instead of CRAN (#96). We would still need to get better coverage on tests (#4) and perhaps consider Jenkins (#22).