artefactual-labs / umcu-uploader

UMCU Uploader
Apache License 2.0
1 stars 1 forks source link

Add AIP UUID to Dataverse Metadata #122

Closed Diogenesoftoronto closed 1 year ago

Diogenesoftoronto commented 1 year ago

Fixes #99

AIP UUIDs will now appear under datasetSources in dataverse.

mcantelon commented 1 year ago

Seems to generate an error when I try to export an AIP to Dataverse:

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/root/repos/umcu-uploader/uploader/Dataverse/jobs.py", line 101, in run
    dv_json = json.load(dv_json_file)
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
io.UnsupportedOperation: not readable
mcantelon commented 1 year ago

This seems to work:

        dv_md = json.loads(read_file(metadata_filepath))

        # Add in the the AIP UUID to the dataverse metadata
        dv_md["datasetVersion"]["metadataBlocks"]["citation"]["fields"].append(
            {
                "typeName": "dataSources",
                "multiple": True,
                "typeClass": "primitive",
                "value": [
                    self.uuid,
                ],
            }
        )
        ds.from_json(json.dumps(dv_md), validate=False)
mcantelon commented 1 year ago

Getting error now:


  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/root/repos/umcu-uploader/uploader/Dataverse/jobs.py", line 101, in run
    dv_json["metadataBlocks"]["citation"]["fields"].append(
KeyError: 'metadataBlocks'```
mcantelon commented 1 year ago

Almost there!

It's generating an exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/root/repos/umcu-uploader/uploader/Dataverse/jobs.py", line 112, in run
    ds.from_json(dv_json, validate=False)
  File "/root/repos/umcu-uploader/venv/lib/python3.8/site-packages/pyDataverse/models.py", line 836, in from_json
    assert isinstance(json_str, str)
AssertionError

...But this change makes it work:

-            json.dump(dv_json, dv_json_file)
-            ds.from_json(dv_json, validate=False)
+            ds.from_json(json.dumps(dv_json), validate=False)
Diogenesoftoronto commented 1 year ago

This seems to work:

        dv_md = json.loads(read_file(metadata_filepath))

        # Add in the the AIP UUID to the dataverse metadata
        dv_md["datasetVersion"]["metadataBlocks"]["citation"]["fields"].append(
            {
                "typeName": "dataSources",
                "multiple": True,
                "typeClass": "primitive",
                "value": [
                    self.uuid,
                ],
            }
        )
        ds.from_json(json.dumps(dv_md), validate=False)

Hey just something to note here in this change, since we are already loading the json on in, does it make sense to call dumps here? Dumps writes straight to disk. I was just worried about this being a bit redundant. Is there a way to avoid so many read/write operations?

mcantelon commented 1 year ago

I think dumps just returns a string and dump stores in a file.

https://www.geeksforgeeks.org/json-dump-in-python/

npoppelier commented 1 year ago

"Hey just something to note here in this change, since we are already loading the json on in, does it make sense to call dumps here? Dumps writes straight to disk. I was just worried about this being a bit redundant. Is there a way to avoid so many read/write operations?" The answer can be found in the documentation of the Python Library. :-)