Trying to set up or complete a harvesting client through the API crashes Dataverse

IQSS / dataverse

Open source research data repository software

Other

888 stars 490 forks source link

What steps does it take to reproduce the issue?

When does this issue occur? While trying to work around the ListSets command not working in the GUI (see #8289 )... I tried to first create a harvesting client (here zenodo_lmops). I retrieved the JSON representation through curl -H X-Dataverse-key:$API_TOKEN -X PUT -H "Content-Type: application/json" $SERVER_URL/api/harvest/clients/zenodo_lmops and then tried to update it with the PUT command. curl -H X-Dataverse-key:$API_TOKEN -X PUT -H "Content-Type: application/json" $SERVER_URL/api/harvest/clients/zenodo_lmops --upload-file client.json

where client.json is like this (I removed only the informations about the last harvests) :


{
    "nickName": "zenodo_lmops",
    "dataverseAlias": "lmops",
    "type": "oai",
    "harvestUrl": "https://zenodo.org/oai2d",
    "archiveUrl": "https://zenodo.org",
    "archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
    "metadataFormat": "oai_dc",
    "set": "user-lmops",
    "schedule": "none",
    "status": "inActive",
  }

The answer to the curl command was as follows and Dataverse / Payara went down.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 root@localhost to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

The server.log file did not show anything particularly relevant, just stopping at :

[2021-12-02T09:00:30.635+0100] [Payara 5.2020] [INFO] [] [edu.harvard.iq.dataverse.api.HarvestingClients] [tid: _ThreadID=89 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1638432030635] [levelValue: 800] [[
  retrieved Harvesting Client zenodo_lmops with the GetHarvestingClient command.]]

Which page(s) does it occurs on?
What happens?
To whom does it occur (all users, curators, superusers)?
What did you expect to happen? I was

Which version of Dataverse are you using?

Any related open or closed issues to this bug report?

8289
8267

This was discussed and the decision was made to keep the Create/Edit/Delete APIs superuser-only. (as implemented, a user with edit permission on the host collection was allowed to create and modify clients). From the slack discussion:

kcondon: Would like help on sorting out behavior of harvesting client api. As tested, it allows collection admins to create and modify harvest clients, just not delete them. In the ui only super users can do this. Is this what we want? A significant possible downside, without additional coding, is that two collections harvesting from the same source/set would collide and potentially get partial lists, since a dataset can only exist once in the app.

landreev: I can confirm that it’s implemented like this :arrow_up: on purpose. But have no recollection of why. (it’s implemented on the command level; but in the ui only the superusers can get to the harvest dashboard) Kevin has a point - it’s a bit strange. IMO, this is a bit out of scope - but it’s not too much effort to make the api superuser only. We were wondering if anyone else has thoughts, etc. The rationale may have been as simple as “we allow people to add linked content to collections, why not allow them to harvest also…” But Kevin’s argument - what if 2 diff. collections decide to harvest from the same remote archive? - does show that it’s impractical.

pdurbin: I’m fine with superuser only for all operations.

Julian: I agree about making the endpoints superuser only. But does super-user only endpoints conflict with the user story? If all three endpoints are made superuser only, will someone want to create a new issue about letting non-superusers manage harvesting clients?

landreev: The more I think about it, the less I can think of any practical value of letting non-superusers create and/or mess with harvesting clients. And, to be clear, “superuser-only” here means that it’ll stay under /api/harvest/clients; so somebody with a superuser api token - like you - would be able to use it remotely; it’s not going to be a localhost-only api) But, for non-superuser, collection-level admins it looks like the scenario should be: if they want some content harvested and show up in their collection, they should ask support/superuser admin to set up the harvest and get that content; and then they can link it into their collection if they so desire. If anyone else wants these harvested datasets to show in their collection, they can link them too. Avoiding the mess of 2 different collection trying to harvest the same archive (and both getting only parts of it; or maybe the one that harvests earlier in the day getting all the datasets, etc.)

IQSS / dataverse

Trying to set up or complete a harvesting client through the API crashes Dataverse #8290

8289

8267