IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
876 stars 484 forks source link

The "ListSets" command fails during the creation of a harvesting client for Zenodo #8289

Closed tjouneau closed 1 year ago

tjouneau commented 2 years ago

What steps does it take to reproduce the issue? As a superuser, going to the harvesting client section of the dashboard and trying to create a new client. The base URL is https://zenodo.org/oai2d OR https://www.zenodo.org/oai2d (only the first one is given by the official documentation at https://developers.zenodo.org).

[2021-12-08T10:09:11.806+0100] [Payara 5.2020] [INFO] [] [edu.harvard.iq.dataverse.HarvestingClientsPage] [tid: _ThreadID=90 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1638954551806] [levelValue: 800] [[ 10 metadata formats total.]]

[2021-12-08T10:09:16.767+0100] [Payara 5.2020] [WARNING] [] [edu.harvard.iq.dataverse.HarvestingClientsPage] [tid: _ThreadID=90 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1638954556767] [levelValue: 900] [[ Failed to execute ListSets; com.lyncode.xoai.serviceprovider.exceptions.HttpException: Error querying service. Returned HTTP Status Code: 500]]


Important note, a curl command entered on the same server (curl -X GET https://zenodo.org/oai2d/?verb=ListsSets) OR directly in a browser retrieves a partial list of sets (the querying error and 500 response are not reproduced in these cases).
My opinion is it would help to know exactly what command is sent by Dataverse. I don't know any way to check this on my side.

* To whom does it occur (all users, curators, superusers)?
You have to be superuser to access the feature.

* What did you expect to happen?
See the set list populated at least partially.

**Which version of Dataverse are you using?**
5.2

**Any related open or closed issues to this bug report?**
#8267 for being able to get around this limitation by filling the "set" field through the API.
#8290 for not being able to do so (makes Dataverse crash).
valentinapasquale commented 2 years ago

Hi,

We have encountered the same issue also in Dataverse 5.6.

On the basis of our experience, ListSets command was working (returning a partial list of sets) until November 16th 2021, then stopped working the day after (following some maintenance on the Zenodo side). We have not opened a ticket to Zenodo yet, given we are not able to debug on the Dataverse side which command is sent to Zenodo and that a curl command entered on the same server (curl -X GET https://zenodo.org/oai2d/?verb=ListsSets) works perfectly, as also reported by @tjouneau.

landreev commented 1 year ago

There's a good chance this has already been fixed; either during the XOAI update (like many other older harvesting issues), or even earlier. Also, we have recently fixed up the harvesting clients API that could be used to create a client when it cannot be done via the UI for whatever reason. I'll give it a 33, just in case. But it may end up being shorter/simpler.

mreekie commented 1 year ago

Priority Review with Stefano:

landreev commented 1 year ago

To confirm what I said back in January, this appears to be another harvesting issue that we already fixed as part of the major overhaul of the underlying oai library used by Dataverse (xoai). Note the error message cited in the original user report:

Failed to execute ListSets; com.lyncode.xoai.serviceprovider.exceptions.HttpException: Error querying service. Returned HTTP Status Code: 500]]

- the package mentioned in it, com.lyncode.xoai has since been replaced by the much improved and updated version of xoai that is now hosted by gdcc as io.gdcc.xoai.

The scenario described, creating a client to harvest from zenodo.org now works, showing a very long list of sets to choose from in the pull down menu. I'm assuming that the reason it was failing in the past was exactly that, that the server was listing too many sets that required a few resumption tokens, with something breaking in the process.

Despite the fact that you can now successfully create the specific client as described in the issue, we should assume that something may still go wrong during the interactive steps involved in creating a client via the GUI. That process by design relies on querying the server in real time, to ensure that it's responding and to get the lists of the sets and the metadata formats that it supports. If any of these requests fail, the client cannot be created. Since this issue was opened, we have added (or rather, fixed) a working API for creating clients. One important thing about that API is that, unlike the GUI, the application does not try to validate the entered url of the server or to make any real time OAI calls. This is by design, giving an admin an option to be able to create a client in a rare case where the ListSets or ListMetadataFormats exchanges are failing with an otherwise valid OAI server, preventing a client from being created via the GUI. (This obviously requires that the admin really knows what they are doing, as they are responsible for supplying valid parameters to the API).

This is not explicitly spelled out in the guide, I realized. I will add that and make a PR closing this issue. But aside from that, I don't think there's anything we need to do here. 🎉