Open ErnestaP opened 1 month ago
Code snippet:
from sqlalchemy import extract
from invenio_pidstore.models import PersistentIdentifier
from invenio_records.api import Record
workflows_created_in_2024_with_error = WorkflowObjectModel.query.filter(
extract('year', WorkflowObjectModel.created) == 2024,
WorkflowObjectModel.status == ObjectStatus.ERROR
).all()
e4='MultipleResultsFound: Multiple rows were found for one_or_none()'
filtered_records_workflows = [e for e in workflows_created_in_2024_with_error if e4 in e.extra_data.get('_error_msg', '')]
for i in filtered_records_workflows:
r1 = i.data
pid = PersistentIdentifier.get("recid", i.data["control_number"])
r2 = Record.get_record(pid.object_uuid)
if "acquisition_source" in r1:
del r1["acquisition_source"]
if "acquisition_source" in r2:
del r2["acquisition_source"]
if "submission_number" in r1:
del r1["submission_number"]
if "submission_number" in r2:
del r2["submission_number"]
if not r1 == r2:
print i.data["control_number"]
What should we do with these workflows? There are 602 workflows, of which are individual 92 records. Some records are updated/harvested more than 1 time.
The majority of updated/reharvested records are the same. 7 of them, by checking programmatically (record == worlfow_record) are different: 85318 85311 82321 85574 85055 84691 85649
Hower, checking these 7 records manually, one by one, Only 3 have minor differences, such as the version ids of files being different 85318 82321 85055