IQSS / dataverse-client-python

Python library for writing clients that use APIs from Dataverse
http://guides.dataverse.org/en/latest/api
Apache License 2.0
32 stars 27 forks source link

Get dataset by doi gives empty response #43

Open mdehollander opened 7 years ago

mdehollander commented 7 years ago

I am experiencing this issue when using the python API:

host = 'demo.dataverse.org'
token = '<my token>'
connection = Connection(host, token)
dataverse = connection.get_dataverse('renatis2017')
dataverse
<dataverse.dataverse.Dataverse object at 0x7f92c5a11b00>
dataverse.get_dataset_by_doi("doi:10.5072/FK2/PVH0HO")

A Dataverse object is obtained, but the get_dataset_by_doi call is empty. Directly using an API request at https://demo.dataverse.org/api/datasets/:persistentId?persistentId=doi:10.5072/FK2/PVH0HO gives an OK json output.

mdehollander commented 7 years ago

As discussed with @pdurbin and @andrewSC on irc: http://irclog.iq.harvard.edu/dataverse/2017-08-23#i_56203

pdurbin commented 7 years ago

Here's a one-liner to list files by DOI using curl and jq:

curl https://demo.dataverse.org/api/datasets/:persistentId?persistentId=doi:10.5072/FK2/PVH0HO | jq '.data.latestVersion.files[].dataFile.filename' -r

Fritz1.JPG
fritz2.JPG
mdehollander commented 7 years ago

It seems that the xml that is returned does not contain any entry tag, is it actually quite empty ;)

>>> dataverse = connection.get_dataverse('testing-journal-dataverses')
>>> dataverse.get_datasets()
https://demo.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/testing-journal-dataverses
b'<feed xmlns="http://www.w3.org/2005/Atom"><title type="text">Testing-journal-dataverses Dataverse</title><dataverseHasBeenReleased xmlns="http://purl.org/net/sword/terms/state">true</dataverseHasBeenReleased><generator uri="http://www.swordapp.org/" version="2.0"/></feed>'
[]
[]

Above I am printing the variables used in get_datasets:

    def get_datasets(self, refresh=False, timeout=None):
        print(self.collection.get('href'))
        collection_info = self.get_collection_info(refresh, timeout=timeout)
        print(collection_info)
        entries = get_elements(collection_info, tag='entry')
        print(entries)
        return [Dataset.from_dataverse(entry, self) for entry in entries]
mdehollander commented 7 years ago

Above is using the https://demo.dataverse.org/dvn/api/url and returns xml, but when we use a direct call to https://demo.dataverse.org/api/datasets/:persistentId?persistentId=doi:10.5072/FK2/PVH0HO json is returned. Has there been a major change of the API? So that the dvn in the url is not valid anymore and that json has replaced a xml output? @pdurbin, can you say something about this?

@rliebz, since you did most of the work for this python api client, do you have time to get things working again?

mdehollander commented 7 years ago

Looking at the docs at http://guides.dataverse.org/en/latest/api/intro.html I realize that the python client is using the SWORD API that uses XML and that the link we were using to get the files is the Native API. What would be the recommended API for retrieving information (not depositing), SWORD or the Native API?

rliebz commented 7 years ago

@mdehollander I likely won't have time in the near future to do any debugging/fixing here—it's been a couple years since I've done any Dataverse work.

As for the recommended API, the client was mostly written back when SWORD was the only option, but I would recommend sticking to the native API wherever possible. Ideally, this project would migrate over to the native API completely (I think there's a little bit of native functionality already in here), but the code is pretty tightly coupled to the structure of the XML that the SWORD API uses, so it might be a bit of a challenge.

mdehollander commented 7 years ago

@rliebz, thanks for letting us know and giving more information the the choice of APIs.

There has already been a first attempt 3 years ago to make a python client using the native api: https://github.com/astrofrog/pyverse. It seems to be working but has not got all functionality.

I see if I can get it working for my use case, and it would be great if there are others in the community who would like to contribute to it as well.

pdurbin commented 7 years ago

Interesting. I see @astrofrog and I talked about pyverse at http://irclog.iq.harvard.edu/dataverse/2015-04-01#i_17806 but I completely forgot about it! @mdehollander if that's a good starting point, I say go for it.