SciCatProject / pyscicat

A python client library for interacting with the SciCat data catalog backend.
Other
0 stars 9 forks source link

Attributes of raw dataset disappear #30

Open zyzzyxdonta opened 2 years ago

zyzzyxdonta commented 2 years ago

Hi,

I was playing with the library and noticed the following: When creating an object of type Dataset with type=DatasetType.raw, then calling its .dict() method, the principalInvestigator and creationLocation fields go missing. I think this might be related to #25?

Here is some example code:

from datetime import datetime
from pathlib import Path
from pprint import pprint

from pyscicat._version import get_versions
from pyscicat.model import Dataset, DatasetType, Ownable

ownable = Ownable(ownerGroup="abc", accessGroups=["abc", "def"])

file = Path("/data/some-dataset/foo.txt")
file_metadata = {"foo": "bar", "fizz": 3, "buzz": 5, "fizzbuzz": [3, 5]}

dataset = Dataset(
    path=str(file),
    size=33,  # in bytes
    owner="John Doe",
    contactEmail="j.doe@localhost",
    creationLocation="computer",
    creationTime=str(datetime.now()),
    type=DatasetType.raw,
    dataFormat="txt",
    principalInvestigator="John Doe",
    sourceFolder=str(file.parent),
    scientificMetadata=file_metadata,
    keywords=["playground", "test"],
    **ownable.dict(),
)

pprint(get_versions())
print()
print(f'{"principalInvestigator" in dataset.dict() = }')
print(f'{"creationLocation" in dataset.dict() = }')
print()
pprint(dataset.dict())

Executing this with Python 3.10.6 yields:

{'date': '2022-08-10T09:07:24-0700',
 'dirty': False,
 'error': None,
 'full-revisionid': '5a182945d8caeaa9e143e48b92b7ecb015ac21a9',
 'version': '0.2.3'}

"principalInvestigator" in dataset.dict() = False
"creationLocation" in dataset.dict() = False

{'accessGroups': ['abc', 'def'],
 'classification': None,
 'contactEmail': 'j.doe@localhost',
 'createdAt': None,
 'createdBy': None,
 'creationTime': '2022-09-14 15:50:56.077144',
 'datasetName': None,
 'description': None,
 'history': None,
 'instrumentId': None,
 'isPublished': False,
 'keywords': ['playground', 'test'],
 'license': None,
 'numberOfFiles': None,
 'numberOfFilesArchived': None,
 'orcidOfOwner': None,
 'owner': 'John Doe',
 'ownerEmail': None,
 'ownerGroup': 'abc',
 'packedSize': None,
 'pid': None,
 'sharedWith': None,
 'size': 33,
 'sourceFolder': '/data/some-dataset',
 'sourceFolderHost': None,
 'techniques': None,
 'type': <DatasetType.raw: 'raw'>,
 'updatedAt': None,
 'updatedBy': None,
 'validationStatus': None,
 'version': None}

As you can see, the mentioned attributes are gone. When going ahead and passing this to ScicatClient.upload_raw_dataset(), this causes the following error:

pyscicat.client.ScicatCommError: Error creating raw dataset {'statusCode': 422, 'name': 'ValidationError', 'message': "The `RawDataset` instance is not valid. Details: `principalInvestigator` can't be blank (value: undefined); `creationLocation` can't be blank (value: undefined).", 'details': {'context': 'RawDataset', 'codes': {'principalInvestigator': ['presence'], 'creationLocation': ['presence']}, 'messages': {'principalInvestigator': ["can't be blank"], 'creationLocation': ["can't be blank"]}}}
zyzzyxdonta commented 2 years ago

I think I may have used the library wrong. I blindly copy-pasted the code from your guide without realising it refers to an ancient version of the library. Maybe you could update the page?