IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
61 stars 25 forks source link

Portability of test suite to clients in other languages #44

Open wibeasley opened 4 years ago

wibeasley commented 4 years ago

@pdurbin and I discussed possible ways to reduce the effort each of the Dataverse client developer to create and maintain tests. It might be nice if

  1. there was a common bank of (sub)dataverses and files that covered a nice spread of scenarios and asserted a client library (i.e., the two Pythons, one Java, one JavaScript, and R) downloaded/uploaded/searched/processed/whatever correctly. For a download test, the test suite confirms that the client returns a file that matches a specific pre-existing file. For a metadata test, the test suite confirms that the client returns a dataset that matches the pre-existing ~csv.
  2. a manifest file enumerates these files, and certain expected characteristics (e.g., md5, approx file size). Currently, I think a csv adequately meets this un-hierarchical need, where each row represents a file that will be tested.
  3. a client's test suite doesn't code specifically for each file. It probably just loops over the manifest file. To add a new condition, only the manifest file and file bank is modified.
  4. the manifest file and the expected files are eventually stored somewhere centrally, that's easily accessible by the client developers. When someone hits a weird case (e.g., the pyDataverse developer finds a problem when processing a csv with a "txt' extension), they'll add that case to the test banks.

@skasberger, @rliebz, @tainguyenbui, and any others, please tell me if this isn't worth it, or there's a better approach, etc.


(This is different from #4 & #29, which involve the battery of tests/comparisons. #40 deals with the deployment of API keys used in testing.)

pdurbin commented 4 years ago
  1. a common bank of (sub)dataverses and files that covered a nice spread of scenarios

This email I just sent about the "sample data" repo feels related. Replies there or here are welcome! https://groups.google.com/d/msg/dataverse-community/_2Tm2B2sQhc/3qSuxnhyBwAJ

skasberger commented 4 years ago

Hi,

sounds interesting. I don't know, if I totally understand the approach. You want to create a central repository with test-methods and test-data, which then should be used by the different dataverse clients? so that the results are everywhere the same, and the knowledge is shared?

In general: I think to work together on API client testing and documentation makes a lot of sense for me. In which way, I don't have a concrete idea, but let's see, what we can work out.

Here my approach so far: I wrote some tests for the API calls, and will hopefully complete the tests for all the other features (data models, utils, OAISTree) in Q2. The collected knowledge is so far documented in the function doc-strings, in the code itself or in my local documentation files. What I would find great, is to know the http status you can expect, depending on what you do on which endpoint. Another difficulty is, to document this version related (endpoints and functionality change from release to release). And to have a proper test-dataset, which works for all of us (Dataverse devs as client devs) whould also be nice. I will anyway have a look into this, cause I have setup jenkins locally this week and will work out some test-data and basic tests for the upcoming Dataverse upgrade.