gdcc / easyDataverse

🪐 - Lightweight Dataverse interface in Python to upload, download and update datasets found in Dataverse installations.
MIT License
15 stars 4 forks source link

Dynamic metadata block generation and DVUploader integration #16

Closed JR-1991 closed 6 months ago

JR-1991 commented 7 months ago

Overview

This pull request introduces the dynamic generation of metadata blocks for Dataverse version >= 5.14, which is based on https://github.com/IQSS/dataverse/pull/9213. The metadata schemes are retrieved from a Dataverse instance and converted into metadata block objects that can be filled with metadata. When uploaded, the objects are transformed into compliant Dataverse JSON and sent through pyDataverse's standard dataset creation/update methods. Additionally, this pull request includes python-dvuploader as a file upload solution, allowing for parallel native/direct uploads to Dataverse. Furthermore, file downloads are also parallelized.

TLDR

Example

from easyDataverse import Dataverse

# Connect to a Dataverse installation
dataverse = Dataverse(
  server_url="https://demo.dataverse.org",
  api_token="MY_API_TOKEN",
)

# Initialize a dataset
dataset = dataverse.create_dataset()

# Fill metadata blocks
dataset.citation.title = "My dataset"
dataset.citation.subject = ["Other"]
dataset.citation.add_author(name="John Doe")
dataset.citation.add_dataset_contact(name="John Doe", email="john@doe.com")
dataset.citation.add_ds_description(value="This is a description of the dataset")

# Upload files or directories
dataset.add_file(local_path="./my.file", dv_dir="some/dir")
dataset.add_directory(dirpath="./my_directory", dv_dir="some/dir")

# Upload to the dataverse instance
dataset.upload("my_dataverse_id")