KBNLresearch / omSipCreator

Create ingest-ready SIPs from batches of optical media images
Apache License 2.0
7 stars 0 forks source link

Warning + wromg output order if same carrierTypes within PPN not contiguous in batch manifest #29

Closed bitsgalore closed 7 years ago

bitsgalore commented 7 years ago

Following batch contains 2 audio cd's that don't appear contiguously in batch manifest:

jobID,PPN,volumeNo,carrierType,title,volumeID,success,containsAudio,containsData
d03ae636-147a-11e7-a687-00237d497a29,155658050,1,cd-audio,(Bijna) alles over bestandsformaten,,True,True,False
82c34f9a-1481-11e7-9f3c-00237d497a29,155658050,1,cd-rom,(Bijna) alles over bestandsformaten,Handbook,True,False,True
e03ae676-147a-11e7-a687-00237d497a29,155658050,2,cd-audio,(Bijna) alles over bestandsformaten,,True,True,False

Result for verify:

WARNING - PPN 155658050 (cd-audio): expected '1' as lower value for 'volumeNumber', found '2'

Result for write, structMap:

<mets:structMap>
<mets:div TYPE="physical" LABEL="volumes">
  <mets:div TYPE="cd-audio" ORDER="1" ADMID="DISC_001">
    <mets:div TYPE="audio track" ORDER="1">
      <mets:fptr FILEID="FILE_001"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="2">
      <mets:fptr FILEID="FILE_002"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="3">
      <mets:fptr FILEID="FILE_003"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="4">
      <mets:fptr FILEID="FILE_004"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="5">
      <mets:fptr FILEID="FILE_005"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="6">
      <mets:fptr FILEID="FILE_006"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="7">
      <mets:fptr FILEID="FILE_007"/>
    </mets:div>
  </mets:div>
  <mets:div TYPE="cd-rom" ORDER="1" ADMID="DISC_002">
    <mets:div TYPE="disk image" ORDER="1">
      <mets:fptr FILEID="FILE_008"/>
    </mets:div>
  </mets:div>
  <mets:div TYPE="cd-audio" ORDER="2" ADMID="DISC_003">
    <mets:div TYPE="audio track" ORDER="1">
      <mets:fptr FILEID="FILE_009"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="2">
      <mets:fptr FILEID="FILE_010"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="3">
      <mets:fptr FILEID="FILE_011"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="4">
      <mets:fptr FILEID="FILE_012"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="5">
      <mets:fptr FILEID="FILE_013"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="6">
      <mets:fptr FILEID="FILE_014"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="7">
      <mets:fptr FILEID="FILE_015"/>
    </mets:div>
    <mets:div TYPE="audio track" ORDER="8">
      <mets:fptr FILEID="FILE_016"/>
    </mets:div>
  </mets:div>
</mets:div>
</mets:structMap>

So again audio CDs don't appear contiguously. Looks like something goes wrong before groupby (check for missing sort?)

ADDITION:

Looks like what's needed is a sort on carriers prior to groupby, just as is also done with rowsBatchManifest.

BUT:

  carriers.sort(key=itemgetter(3))

Results in:

AttributeError: 'itertools._grouper' object has no attribute 'sort'

So why does this work for RowsbatchManifest? Because:

type(rowsBatchManifest) = <class 'list'>, but type(carriers type) = <class 'itertools._grouper'>. So we need to figure out how to sort s grouper object.

bitsgalore commented 7 years ago

Fixed: https://github.com/KBNLresearch/omSipCreator/commit/8350b962c8fc31fdb419653eb8963bb0897fa435