internal_note field not completely migrated

egabancho commented 6 years ago

Check the content of 595 in https://cds.cern.ch/record/1541893/export/hm and compare it with internal_note in https://videos.cern.ch/api/record/1541893

I think the problem came from https://github.com/CERNDocumentServer/cds-dojson/blob/6732dd22baa491d4e8a553dee30293aec77415fc/cds_dojson/marc21/fields/videos/video.py#L143 because it's not iterating over all the values but only taking one (the last one)

ntarocco commented 6 years ago

@ludmilamarian after discussions, we need your input. CDS example record for the 595 tag:

<datafield tag="595" ind1=" " ind2=" ">
    <subfield code="a">Press</subfield>
    <subfield code="s">Press Videos</subfield>
</datafield>
<datafield tag="595" ind1=" " ind2=" ">
    <subfield code="a">Press</subfield>
    <subfield code="s">Animations - Science</subfield>
</datafield>
<datafield tag="595" ind1=" " ind2=" ">
    <subfield code="a">Press</subfield>
    <subfield code="s">B-Roll Footage</subfield>
</datafield>

This will be translated to something like

{
    "internal_notes": "Press, Press Videos, Press, Animations -Science, Press, B-Roll Footage"
}

(or without duplication of Press) But we were thinking that maybe it makes more sense to have the information more structured, something like:

{
    "internal_keywords": [
        {"name": "Press", "value": "Press Videos"},
        {"name": "Press", "value": "Animations -Science"},
        {"name": "Press", "value": "B-Roll Footage"}
    ]
}

It really depends on the future needs, how we want to find this information. What do you think?

egabancho commented 6 years ago

I just run this script

from cds_dojson.marc21.utils import load
from cds_dojson.marc21.models.videos.video import model
from cds.modules.records.resolver import record_resolver
from cds.modules.deposit.api import CDSDeposit
from invenio_db import db
from invenio_indexer.api import RecordIndexer

indexer = RecordIndexer()

with open('./595.xml') as f:
    records = [xml for xml in load(f)]

for xml_record in records:
    record_595 = model.do(xml_record)
    pid, record = record_resolver.resolve(record_595['recid'])
    deposit = CDSDeposit.get_record(record.depid.object_uuid)
    if 'internal_note' in record_595:
        record['internal_note'] = record_595['internal_note']
        deposit['internal_note'] = record_595['internal_note']
    else:
        try:
            del record['internal_note']
            del deposit['internal_note']
        except:
            print(record['recid'])
    if 'internal_categories' in record_595:
        record['internal_categories'] = record_595['internal_categories']
        deposit['internal_categories'] = record_595['internal_categories']
        press = record_595.get('internal_categories', {}).get('Press', [])
        if press:
            record['Press'] = press
            deposit['Press'] = press

    deposit.commit()
    record.commit()
    db.session.commit()
    indexer.index(record)
    indexer.index(deposit)

Which in the end gives this list of URLs for the press office:

"Press: " [https://videos.cern.ch/search?q=Press:](https://videos.cern.ch/search?q=Press:*)
"Press: CERN60" https://videos.cern.ch/search?q=Press:"CERN60"
"Press: " https://videos.cern.ch/search?q=Press:""
"Press: Animations - Science" https://videos.cern.ch/search?q=Press:"Animations - Science"
"Press: run2" https://videos.cern.ch/search?q=Press:"run2"
"Press: Higgs" https://videos.cern.ch/search?q=Press:"Higgs"
"Press: CERN" https://videos.cern.ch/search?q=Press:"CERN"
"Press: ALICE" https://videos.cern.ch/search?q=Press:"ALICE"
"Press: Particle collisions" https://videos.cern.ch/search?q=Press:"Particle collisions"
"Press: General" https://videos.cern.ch/search?q=Press:"General"
"Press: Large Hadron Collider" https://videos.cern.ch/search?q=Press:"Large Hadron Collider"
"Press: LHCb" https://videos.cern.ch/search?q=Press:"LHCb"
"Press: Press Videos" https://videos.cern.ch/search?q=Press:"Press Videos"
"Press: AcceleratorsDetectors" https://videos.cern.ch/search?q=Press:"AcceleratorsDetectors"
"Press: VNR" https://videos.cern.ch/search?q=Press:"VNR"
"Press: LS1" https://videos.cern.ch/search?q=Press:"LS1"
"Press: ATLAS" https://videos.cern.ch/search?q=Press:"ATLAS"
"Press: 4K" https://videos.cern.ch/search?q=Press:"4K"
"Press: CMS" https://videos.cern.ch/search?q=Press:"CMS"
"Press: B-Roll Footage" https://videos.cern.ch/search?q=Press:"B-Roll Footage"
"Press: LHC experiments" https://videos.cern.ch/search?q=Press:"LHC experiments"
"Press: History" https://videos.cern.ch/search?q=Press:"History"

CERNDocumentServer / cds-videos

internal_note field not completely migrated #1576