cern-sis / issues-scoap3

0 stars 0 forks source link

Error state records: MultipleResultsFound: Multiple rows were found for one_or_none() #330

Open ErnestaP opened 1 month ago

ErnestaP commented 1 month ago

What should we do with these workflows? There are 602 workflows, of which are individual 92 records. Some records are updated/harvested more than 1 time.

The majority of updated/reharvested records are the same. 7 of them, by checking programmatically (record == worlfow_record) are different: 85318 85311 82321 85574 85055 84691 85649

Hower, checking these 7 records manually, one by one, Only 3 have minor differences, such as the version ids of files being different 85318 82321 85055

ErnestaP commented 1 month ago

Code snippet:

from sqlalchemy import extract         
from invenio_pidstore.models import PersistentIdentifier
from invenio_records.api import Record

 workflows_created_in_2024_with_error = WorkflowObjectModel.query.filter(                                  
          extract('year', WorkflowObjectModel.created) == 2024,       
          WorkflowObjectModel.status == ObjectStatus.ERROR
      ).all()

e4='MultipleResultsFound: Multiple rows were found for one_or_none()'
filtered_records_workflows = [e for e in workflows_created_in_2024_with_error if e4 in e.extra_data.get('_error_msg', '')]

for i in filtered_records_workflows:                                                   
     r1 = i.data                                                                                           
     pid = PersistentIdentifier.get("recid", i.data["control_number"])
     r2 = Record.get_record(pid.object_uuid)              
     if "acquisition_source" in r1:
         del r1["acquisition_source"]
     if "acquisition_source" in r2:
         del r2["acquisition_source"]
     if "submission_number" in r1:
         del r1["submission_number"]
     if "submission_number" in r2:
         del r2["submission_number"]

     if not r1 == r2:
         print i.data["control_number"]