Closed kbrueckmann closed 1 month ago
Forgot to mention: I cannot change the date to something like "01.01.1594" because Dataverse won't accept that in that field. Otherwise I get this error message: Time Period Start Date is not a valid date. "yyyy" is a supported format.
Interesting. Indeed I can enter these values fine at https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/RN55IT
Yes, entering them is no problem. What happens if you now try to load this dataset via the load_dataset()-function?
@kbrueckmann, thank you for bringing up this issue! It’s a known limitation with Python’s date module when used with pydantic, as it requires a full date and doesn’t support year-only entries.
There’s an open PR (#27) that resolves this by reverting to a str input. Due to the variety of date formats in Dataverse, using the date module has become impractical. I’ll be reviewing and merging the open PRs over the next two weeks for the upcoming release, which will include all the new features as well as fixes.
I have merged the PR and the fix is now available on the main branch. You can use the updated version now, by using the following command:
pip install git+https://github.com/gdcc/easyDataverse.git
Here is a colab notebook that uses the current version and assigns the time period via strings. Loading the dataset now also works:
Thanks for your quick replies and help, @pdurbin and @JR-1991 ! I just tested the fix (after the pip install, of course), but I'm still having difficulties. The rich.print()
in my code below never happens, because the dataset loading fails with the same ValidationError as before. I think the difference to the shared colab might be that I'm not setting the time period values but rather just loading a dataset in which they were previously entered via the GUI (or somehow the update to the fix didn't work, but I got no error messages indicating that).
Here is what I'm doing:
dataverse = Dataverse(
server_url="https://heidata.uni-heidelberg.de/",
api_token=api_token
)
dataset = dataverse.load_dataset(
pid=pid,
download_files=False
)
rich.print(dataset.citation)
The pid is the string "https://doi.org/10.11588/data/DVU14P". I can't share my API token, but at least for fetching data this one should work: 637c97c7-042e-4f00-b597-3736f07fe8a4 .
@kbrueckmann thanks for sharing! I have tested your case and the issue stems from the wrong pid
format. Dataverse expects the DOI in the format that is presented at your dataset instead of a link. You can find it within the Citation metadata block:
When using doi:10.11588/data/DVU14P
the code does not fail anymore and the dataset is printed as expected. I have also tested it with your API Token and it worked as well. I would suggest recreating your token to prevent any malicious use.
Hope that helped. Please let me know if there are any other issues, happy to help 🙌
After changing the pid to the required format, I still had the same problem – so just to make sure it wasn't connected to any issues with the update I set up a new venv; did a fresh install of the necessary packages and now it's working perfectly. Thank you so much, @JR-1991 !
I'm updating files in a dataset without touching anything else. The dataset has a set "time period" in its metadata with these values:
Start Date: 1594 End Date: 1636
When loading the dataset they apparently lead to a ValidationError (I assume because only a year is given):
File "venv/lib/python3.12/site-packages/easyDataverse/dataverse.py", line 315, in load_dataset self._construct_block_classes(blocks, dataset) File "venv/lib/python3.12/site-packages/easyDataverse/dataverse.py", line 416, in _construct_block_classes dataset.metadatablocks[name] = metadatablock.class.model_validate( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "venv/lib/python3.12/site-packages/pydantic/main.py", line 596, in model_validate return cls.__pydantic_validator__.validate_python( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 2 validation errors for Citation time_period_covered.0.start Datetimes provided to dates should have zero time - e.g. be exact dates [type=date_from_datetime_inexact, input_value='1594', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/date_from_datetime_inexact time_period_covered.0.end Datetimes provided to dates should have zero time - e.g. be exact dates [type=date_from_datetime_inexact, input_value='1636', input_type=str] For further information visit https://errors.pydantic.dev/2.9/v/date_from_datetime_inexact
Is there any way to change that behavior?