gdcc / pyDataverse

Python module for Dataverse Software (dataverse.org).
http://pydataverse.readthedocs.io/
MIT License
63 stars 41 forks source link

Posting JSON broken on Dataverse 5.9 #143

Closed Jeija closed 1 month ago

Jeija commented 2 years ago

Creating a new dataset does not work anymore with Dataverse 5.9:

>>> resp = api.create_dataset("my_dataverse_name", json.dumps({"datasetVersion": dataset_description}))

with some valid dataset_description object returns

>>> resp.json()
{'status': 'ERROR', 'message': 'Validation Failed: Author Name is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Contact E-mail is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Description Text is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Subject is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Title is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]).'}

The same code worked for older Dataverse versions.

For some reason, Dataverse now needs a Content-Type: application/json header to understand my JSON. I have been able to fix this issue temporarily by replacing this line in post_request with

headers = {}
headers["Content-Type"] = "application/json"
resp = post(url, data=data, params=params, files=files, headers=headers)

but I don't think that's the proper way to do it (may cause issues when posting files).

pdurbin commented 2 years ago

@Jeija my guess is that the change (enforcement of Author Name and other required fields) occurred with Dataverse 5.7. I left a comment about this here: https://github.com/IQSS/dataverse/issues/8210#issuecomment-1064392959

Jeija commented 2 years ago

Thanks for digging into this. I also just wanted to mention that there is a patch for pyDataverse that should fix this issue out there now: https://github.com/JR-1991/pyDataverse/commit/0fcfcd3fbc6bf1aec869899f715a51dca25e91be (I haven't tested it or looked into it more closely at this point)

pdurbin commented 2 years ago

@Jeija interesting. We're discussing the content type header here (please feel free to join in):

jggautier commented 1 year ago

A depositor I was helping might've run into this issue when using the create_dataset function. Until the function is updated, I thought it would be helpful to share a workaround.

After reading in the dataset.json file (the one at https://github.com/gdcc/pyDataverse/blob/master/tests/data/user-guide/dataset.json) as a string, if I use the json library to do a json.loads, then json.dumps, then pass that variable into the create_dataset function, the dataset is created. I tested this a few times on the Harvard repository:

with open(path_to_json_file, 'r') as f:
    ds = f.read()
    ds = json.loads(ds)
    ds = json.dumps(ds)

resp = api.create_dataset('dataverse_alias', ds)
pdurbin commented 1 year ago

@jggautier interesting fix. Maybe at that point Python knows it's JSON instead of a string and sends the application/json content type header?

Anyway, hopefully #145 will fix it.

Also, in https://github.com/IQSS/dataverse/pull/8676 we updated the API Guide to say that the application/json content type header is now required to create a dataset.

skasberger commented 1 year ago

Update: I left AUSSDA, so my funding for pyDataverse development has stopped.

I want to get some basic funding to implement the most urgent updates (PRs, Bug fixes, maintenance work). If you can support this, please reach out to me. (www.stefankasberger.at). If you have feature requests, the same.

Another option would be, that someone else helps with the development and / or maintenance. For this, also get in touch with me (or comment here).

jmurugan-fzj commented 7 months ago

@pdurbin I have seen the same problem in the latest pydataverse version, while creating datasets in local data-verse setup, invoking the api.create_dataset fails with the validation error and the same error can be also reproduced in the postman easily by simply not setting the 'Content-Type' in the headers. I guess, missing 'Content-Type' leads to the server behavior that it fails to parse 'metadatablock' in the request body...

image

image

Where as the demo.dataverse.org does not seem to have this problem, it works even without 'Content-Type' in the headers, is this some API level setting in demo.dataverse.org which makes it behave differently or could be software version difference? Just being curious...!

image

pdurbin commented 7 months ago

@jmurugan-fzj yes, this is expected because the following pull request (or equivalent) hasn't been merged yet:

Th backward incompatibility isn't pyDataverse's fault. It was introduced upstream in https://github.com/IQSS/dataverse/commit/509746ca6091858a1ae9a8786f8a0e634d10c9d0 in 2021 (v5.6).

The demo server should be the same as your local env. I'm not sure why you're seeing a difference. 🤔

jmurugan-fzj commented 7 months ago

@jmurugan-fzj yes, this is expected because the following pull request (or equivalent) hasn't been merged yet:

Th backward incompatibility isn't pyDataverse's fault. It was introduced upstream in IQSS/dataverse@509746c in 2021 (v5.6).

The demo server should be the same as your local env. I'm not sure why you're seeing a difference. 🤔

@pdurbin The default installation of dataverse-docker spins up an old version of data-verse container (coronawhy/dataverse: 5.13.allclouds), this was the reason for the validation error, once I updated docker-compose to use the latest version (6.0) of data-verse container, this error does not occur anymore!

pdurbin commented 7 months ago

@jmurugan-fzj great! When you say latest 6.0 container, are you talking about the alpha image from https://hub.docker.com/r/gdcc/dataverse ? If you'd ever like to talk containers, we have a weekly meeting: https://ct.gdcc.io

pdurbin commented 1 month ago

The backward incompatibility isn't pyDataverse's fault. It was introduced upstream in IQSS/dataverse@509746c in 2021 (v5.6).

We documented this in our new API changelog of breaking changes:

JR-1991 commented 1 month ago

@Jeija thanks for submitting this issue. We have resolved this with the switch to httpx and the latest commit https://github.com/gdcc/pyDataverse/commit/929e5c949589909ebcf8d50de194de5573869d9a. As shown in the attached screenshot, POST and PUT requests now send the correct header upon dispatch.

image