dials / data

DIALS Regression Data Manager
https://pypi.python.org/pypi/dials_data
BSD 3-Clause "New" or "Revised" License
0 stars 10 forks source link

Automated Zenodo uploader #148

Open graeme-winter opened 4 years ago

graeme-winter commented 4 years ago

Description

Propose to add an automated zenodo data uploader which could also generate the appropriate JSON text for the new data set - there is a REST API which appears to work simply enough. Will require a user generate an upload token using instructions at:

https://zenodo.org/account/settings/applications/tokens/new/

What I Did

import requests
import os
import sys
import pprint

# get yourself an access token from:
#
# https://zenodo.org/account/settings/applications/tokens/new/

ACCESS_TOKEN = "aaaaaaaaa"

headers = {"Content-Type": "application/json"}
r = requests.post(
    "https://zenodo.org/api/deposit/depositions",
    params={"access_token": ACCESS_TOKEN},
    json={},
    headers=headers,
)
print(r.status_code)
print(r.json())

d_id = r.json()["id"]

for directory in sys.argv[1:]:
    for filename in os.listdir(directory):
        print(filename)
        data = {"name": filename}
        files = {"file": open(os.path.join(directory, filename), "rb")}
        r = requests.post(
            "https://zenodo.org/api/deposit/depositions/%s/files" % d_id,
            params={"access_token": ACCESS_TOKEN},
            data=data,
            files=files,
        )
        pprint.pprint(r.json())

allows automated upload of every file in a directory, as an example - the token can have permission to complete the upload and publish, but in my test case I did not test this out, just used it to upload 3,450 files.

graeme-winter commented 4 years ago

Turns out that this jams up / starts pulling HTTP 500 errors after ~1,000 or so files - chatting to Zenodo developers about this. Should probably work out a way to incrementally upload data sets.

graeme-winter commented 4 years ago

Ah, they know they have some n² loops or something in the way they handle things and are looking to add an explicit limit to the number of files in a data set. May need a better way to do this.

One idea which occurs to me is whether we can make data public in iCAT? 🤔