IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
61 stars 25 forks source link

Add integration test to download a file by filename #2

Closed pdurbin closed 7 years ago

pdurbin commented 8 years ago

I'm opening this issue because I just opened and equivalent issue for the Python client: https://github.com/IQSS/dataverse-client-python/issues/29

What I'm really after is a way to address this issue that @monogan opened at https://github.com/IQSS/dataverse/issues/2700 in which he's trying to figure out what to write in a book about R with regard to how to download files.

Ideally (in my mind), the R package for Dataverse would provide some insulation between the readers of his book and the Dataverse APIs. The book would say, "Install the dataverse package from CRAN and download the file by..." That way, even if the APIs change a bit, future readers of his book will download the latest version of the R package from CRAN and it will still "just work".

I think that the only things users should need to download a file is the DOI of the dataset and the filename. The dataverse package can do the rest. :) It would be way cleaner than my hack: https://github.com/IQSS/dataverse/commit/812424a5c3a930518abfb946b96f76dcd81d24ab .

leeper commented 8 years ago

I just pushed a quick hack of this based on SWORD as f785083. Basically, it retrieves the SWORD statement, looks for the filename among the named files, retrieves the associated fileId, and then uses the data access API to download the file based on that ID.

pdurbin commented 8 years ago

@monogan take note! Thanks, @leeper !

Oh, we've talked about setting up Travis for continuous integration over in Python land at https://github.com/IQSS/dataverse-client-python/issues/19

I also suggested using the Python client to test each deployment to https://apitest.dataverse.org - https://github.com/IQSS/dataverse-client-python/issues/10 . However, we could use the R client for this too. :)

If any this is of interest, please let me know how I can help!

leeper commented 8 years ago

@pdurbin Travis would be great! I think I can set it up if I get admin access to the repo, or you could toggle the switch on the Travis website to allow builds. The repo .travis.yml is already configured for the package and to run code coverage from there.

Running tests through the package would be cool. We could probably set up a Travis webhook to automatically trigger builds on changes to the main dataverse repo. (I've never tried that, though.) On Nov 10, 2015 1:46 PM, "Philip Durbin" notifications@github.com wrote:

@monogan https://github.com/monogan take note! Thanks, @leeper https://github.com/leeper !

Oh, we've talked about setting up Travis for continuous integration over in Python land at IQSS/dataverse-client-python#19 https://github.com/IQSS/dataverse-client-python/issues/19

I also suggested using the Python client to test each deployment to https://apitest.dataverse.org - IQSS/dataverse-client-python#10 https://github.com/IQSS/dataverse-client-python/issues/10 . However, we could use the R client for this too. :)

If any this is of interest, please let me know how I can help!

— Reply to this email directly or view it on GitHub https://github.com/IQSS/dataverse-client-r/issues/2#issuecomment-155423138 .

pdurbin commented 8 years ago

@leeper I just made you ( by way of https://github.com/orgs/IQSS/teams/dataverse-client-r ) an admin on this repo:

collaborators_-_2015-11-10_10 54 12

If there's anything else you need from me, please let me know!

leeper commented 8 years ago

Excellent! Thanks!

Thomas J. Leeper http://www.thomasleeper.com

On Tue, Nov 10, 2015 at 3:55 PM, Philip Durbin notifications@github.com wrote:

@leeper https://github.com/leeper I just made you ( by way of https://github.com/orgs/IQSS/teams/dataverse-client-r ) an admin on this repo:

[image: collaborators_-_2015-11-10_10 54 12] https://cloud.githubusercontent.com/assets/21006/11067206/77ed80f8-8799-11e5-95ed-78fa51bb5698.png

If there's anything else you need from me, please let me know!

— Reply to this email directly or view it on GitHub https://github.com/IQSS/dataverse-client-r/issues/2#issuecomment-155461908 .

ghost commented 8 years ago

@leeper @pdurbin This all looks very encouraging. If you want me to try some "user testing," please let me know. Also, if you want some target data to play with, feel free to use mine: http://dx.doi.org/10.7910/DVN/ARKOTI

pdurbin commented 8 years ago

Running tests through the package would be cool. We could probably set up a Travis webhook to automatically trigger builds on changes to the main dataverse repo. (I've never tried that, though.)

@leeper all this sounds great. Would it be ok if we add an issue in this repo specifically about what our goals would be? Do we want to run the R client against every commit/build of Dataverse? Or is that too ambitious? Please see this related issue I just opened: https://github.com/IQSS/dataverse/issues/2746

leeper commented 8 years ago

@pdurbin Yes, let's open a new issue on this repo and then brainstorm best way to do tests. Every commit would probably be simplest to configure but might be overkill. Maybe we could just have daily builds?

pdurbin commented 8 years ago

@leeper maybe. When you have a chance, perhaps you could join me in http://chat.dataverse.org and we could brainstorm about this. I'm glad you're into it! :)

pdurbin commented 8 years ago

@leeper I was just talking to @izahn about this. He might be interested in at least kicking the tires on the Dataverse R package/client.

Also, some news on my end is that I've set up a new test server at http://phoenix.dataverse.org (so called because on every build I drop its database etc. and start fresh) that we could use for integration testing from the R client. I've been working on tests in Java as I mentioned on the the mailing list but the test suite is quite incomplete (and I'm kind of being pulled into other projects). Mostly it's hitting the SWORD API. I'm going to try to get other developers excited about writing API tests in a talk next week (I'm working on slides at http://bl.ocks.org/pdurbin/raw/814fd29916749523db9a and you're welcome to come @izahn) but it would be awesome to have a test suite (in R perhaps!) that exercises more of the Dataverse API.

Basically, what I'm suggesting is that we could set up a Jenkins job similar to https://build.hmdc.harvard.edu:8443/job/phoenix.dataverse.org-apitest-4.2.3/ that uses the R client rather than Java. It would hit the phoenix server (unreleased Dataverse code). And maybe another job that hits https://apitest.dataverse.org (stable, released Dataverse code).

leeper commented 8 years ago

@pdurbin This all looks awesome. I'm putting this all on my agenda for late next week (when I'll be trapped at home, due to impending London tube strike). I bet I can get this finished really quickly, if I just sit down with it for a few hours and crank through it.

pdurbin commented 8 years ago

@leeper fantastic. Fingers crossed that Jenkins will "just work" with the output of an R test suite (make fancy graphs and all the rest, trends) but please let me know if you need any plugins installed. Also, I'll need advice on how to execute the tests... something like https://github.com/IQSS/dataverse-client-python#testing please.

pdurbin commented 8 years ago

@leeper heads up that this just got a lot easier since the fix for https://github.com/IQSS/dataverse/issues/1837 made it into Dataverse 4.3. Now you can use DOIs in your requests:

curl https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI | jq '.data.latestVersion.files[1]'

{
  "description": "National Survey of High School Biology Teachers. Citation: Berkman, Michael and Eric Plutzer. 2010. Evolution, Creationism, and the Battle to Control America's Classrooms. New York: Cambridge University Press.",
  "label": "BPchap7.tab",
  "version": 2,
  "datasetVersionId": 75170,
  "datafile": {
    "id": 2692295,
    "name": "BPchap7.tab",
    "contentType": "text/tab-separated-values",
    "filename": "14e664cd409-7a2dc0c380f9",
    "originalFileFormat": "application/x-stata",
    "originalFormatLabel": "Stata Binary",
    "UNF": "UNF:6:B3/HJbnzktaX5eEJA2ItiA==",
    "md5": "e8c62465ef6a1a8451a21a43ce7b264e",
    "description": "National Survey of High School Biology Teachers. Citation: Berkman, Michael and Eric Plutzer. 2010. Evolution, Creationism, and the Battle to Control America's Classrooms. New York: Cambridge University Press."
  }
}

Docs at http://guides.dataverse.org/en/4.3/api/native-api.html#datasets

pdurbin commented 8 years ago

@leeper just a heads up that I mentioned https://apitest.dataverse.org several times in comments above but we are shutting this server down per https://github.com/IQSS/dataverse/issues/3345

To test the latest release of Dataverse https://demo.dataverse.org should be used.

We can talk about maybe using http://phoenix.dataverse.org to test upcoming releases. Please ping me first. 😄

leeper commented 8 years ago

Sounds good. I saw your note and plan to write some tests using the "demo" server.