Closed egabancho closed 6 years ago
@ludmilamarian after discussions, we need your input. CDS example record for the 595 tag:
<datafield tag="595" ind1=" " ind2=" ">
<subfield code="a">Press</subfield>
<subfield code="s">Press Videos</subfield>
</datafield>
<datafield tag="595" ind1=" " ind2=" ">
<subfield code="a">Press</subfield>
<subfield code="s">Animations - Science</subfield>
</datafield>
<datafield tag="595" ind1=" " ind2=" ">
<subfield code="a">Press</subfield>
<subfield code="s">B-Roll Footage</subfield>
</datafield>
This will be translated to something like
{
"internal_notes": "Press, Press Videos, Press, Animations -Science, Press, B-Roll Footage"
}
(or without duplication of Press
)
But we were thinking that maybe it makes more sense to have the information more structured, something like:
{
"internal_keywords": [
{"name": "Press", "value": "Press Videos"},
{"name": "Press", "value": "Animations -Science"},
{"name": "Press", "value": "B-Roll Footage"}
]
}
It really depends on the future needs, how we want to find this information. What do you think?
I just run this script
from cds_dojson.marc21.utils import load
from cds_dojson.marc21.models.videos.video import model
from cds.modules.records.resolver import record_resolver
from cds.modules.deposit.api import CDSDeposit
from invenio_db import db
from invenio_indexer.api import RecordIndexer
indexer = RecordIndexer()
with open('./595.xml') as f:
records = [xml for xml in load(f)]
for xml_record in records:
record_595 = model.do(xml_record)
pid, record = record_resolver.resolve(record_595['recid'])
deposit = CDSDeposit.get_record(record.depid.object_uuid)
if 'internal_note' in record_595:
record['internal_note'] = record_595['internal_note']
deposit['internal_note'] = record_595['internal_note']
else:
try:
del record['internal_note']
del deposit['internal_note']
except:
print(record['recid'])
if 'internal_categories' in record_595:
record['internal_categories'] = record_595['internal_categories']
deposit['internal_categories'] = record_595['internal_categories']
press = record_595.get('internal_categories', {}).get('Press', [])
if press:
record['Press'] = press
deposit['Press'] = press
deposit.commit()
record.commit()
db.session.commit()
indexer.index(record)
indexer.index(deposit)
Which in the end gives this list of URLs for the press office:
Check the content of
595
in https://cds.cern.ch/record/1541893/export/hm and compare it withinternal_note
in https://videos.cern.ch/api/record/1541893I think the problem came from https://github.com/CERNDocumentServer/cds-dojson/blob/6732dd22baa491d4e8a553dee30293aec77415fc/cds_dojson/marc21/fields/videos/video.py#L143 because it's not iterating over all the values but only taking one (the last one)