NHMDenmark / DaSSCo-Integration

This Repo will include integration of dassco storage from northtec
0 stars 0 forks source link

Missing parent guid #125

Open Baeist opened 1 month ago

Baeist commented 1 month ago

When creating assets through the pipeline, half of the derivatives end up missing the parent guid when calling their get asset endpoint. The parent guid does get sent back as part of the response body when the asset is created. However you can open the share of the derivative getting the parent files using the parent guid, which is not showing in the metadata (also does not show up in the dassco UI). The derivatives that miss the parent guid are different from the ones that gets their parent guid by very few values in the metadata. The fileformat is TIF instead of JPEG and the payload type is image instead of thumbnail. The requested fileproxy size is different too 1011 mb vs ~650 mb. I cant replicate this issue manually through postman. Here the parent guid shows up fine after creation. This is as far as i can tell a newish bug. @bhsi-snm

Baeist commented 1 month ago

examples guids: dev-ucloud-112, dev-ucloud-112_400, dev-ucloud-112_72 dev-ucloud-112_400 is the derivative missing the parent guid.

Baeist commented 6 days ago

Example from today dev-ucloud-710_400 is missing its parent but the dev-ucloud-710_72 has it. Internal log output for their creation shows the response has the parent guid set for both, response body data is starting at pid:

1011 {'asset_pid': 'INSERT_FOR_TESTING_PURPOSES', 'asset_guid': 'dev-ucloud-710_400', 'parent_guid': 'dev-ucloud-710', 'status': 'WORKING_COPY', 'multi_specimen': False, 'specimens': [{'institution': 'NHMD', 'collection': 'Vascular plants', 'barcode': '00937897', 'specimen_pid': '', 'preparation_type': 'sheet'}], 'funding': 'DaSSCo', 'subject': 'specimen', 'payload_type': 'image', 'file_formats': ['TIF'], 'asset_locked': False, 'restricted_access': [], 'audited': False, 'date_asset_taken': '2024-04-10T14:09:59+02:00', 'institution': 'NHMD', 'collection': 'Vascular plants', 'pipeline': 'PIPEHERB0001', 'workstation': 'WORKHERB0001', 'digitiser': 'Sara Stenz', 'tags': {'metadataTemplate': 'v2_1_0'}} pid='INSERT_FOR_TESTING_PURPOSES' guid='dev-ucloud-710_400' status='WORKING_COPY' multi_specimen=False specimens=[SpecimenModel(institution='NHMD', collection='Vascular plants', barcode='00937897', pid='', preparation_type='sheet')] funding='DaSSCo' subject='specimen' payload_type='image' file_formats=['TIF'] asset_locked=False restricted_access=[] institution='NHMD' collection='Vascular plants' pipeline='PIPEHERB0001' digitiser='Sara Stenz' parent_guid='dev-ucloud-710' audited=False internal_status='METADATA_RECEIVED' tags={'metadataTemplate': 'v2_1_0'} http_info=HTTPInfoModel(path='/assetfiles/NHMD/Vascular plants/dev-ucloud-710_400/', hostname='https://dassco.dk', total_storage_mb=299999, cache_storage_mb=200, remaining_storage_mb=194068, allocated_storage_mb=1011, allocation_status_text=None, http_allocation_status='SUCCESS') total amount in system: 6530/45000 total amount in system: 6530/45000 im alive total amount in system: 6530/45000 631 {'asset_pid': 'INSERT_FOR_TESTING_PURPOSES', 'asset_guid': 'dev-ucloud-710_72', 'parent_guid': 'dev-ucloud-710', 'status': 'WORKING_COPY', 'multi_specimen': False, 'specimens': [{'institution': 'NHMD', 'collection': 'Vascular plants', 'barcode': '00937897', 'specimen_pid': '', 'preparation_type': 'sheet'}], 'funding': 'DaSSCo', 'subject': 'specimen', 'payload_type': 'thumbnail', 'file_formats': ['JPEG'], 'asset_locked': False, 'restricted_access': [], 'audited': False, 'date_asset_taken': '2024-04-10T14:09:59+02:00', 'institution': 'NHMD', 'collection': 'Vascular plants', 'pipeline': 'PIPEHERB0001', 'workstation': 'WORKHERB0001', 'digitiser': 'Sara Stenz', 'tags': {'metadataTemplate': 'v2_1_0'}} pid='INSERT_FOR_TESTING_PURPOSES' guid='dev-ucloud-710_72' status='WORKING_COPY' multi_specimen=False specimens=[SpecimenModel(institution='NHMD', collection='Vascular plants', barcode='00937897', pid='', preparation_type='sheet')] funding='DaSSCo' subject='specimen' payload_type='thumbnail' file_formats=['JPEG'] asset_locked=False restricted_access=[] institution='NHMD' collection='Vascular plants' pipeline='PIPEHERB0001' digitiser='Sara Stenz' parent_guid='dev-ucloud-710' audited=False internal_status='METADATA_RECEIVED' tags={'metadataTemplate': 'v2_1_0'} http_info=HTTPInfoModel(path='/assetfiles/NHMD/Vascular plants/dev-ucloud-710_72/', hostname='https://dassco.dk', total_storage_mb=299999, cache_storage_mb=200, remaining_storage_mb=193666, allocated_storage_mb=631, allocation_status_text=None, http_allocation_status='SUCCESS') total amount in system: 7161/45000

bhsi-snm commented 6 days ago

We just checked and this issue still exist @Grand666, @Baeist got this error today for asset_guid: dev-ucloud-710_400 It is still only when creating derivatives of a certain type, and we have investigated on our side its not us.

Baeist commented 1 day ago

Did a batch with dev-ucloud-811 ... dev-ucloud-850 and the derivatives ending in _400 are missing the parent guid, then switched the order we create the derivatives in from 851-860 and the _72 are now missing the parent guids instead. Just to make it clear the one that is created first will be missing the parent guid. @Grand666 @bhsi-snm