adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

Clean up identifier field #217

Closed golnazads closed 5 years ago

golnazads commented 5 years ago

related to issue #209 adspy from Alberto: make a small change to https://github.com/adsabs/ADSImportPipeline/blob/master/aip/classic/solr_adapter.py#L395 to make sure we capture these into SOLR's identifier field.

golnazads commented 5 years ago

examples:

    <metadata origin="ADS metadata" type="relations" primary="False" alternate_journal="False">
      <alternates/>
      <identifiers>
        <identifier bibcode="2018arXiv180710779B">arXiv:1807.10779</identifier>
      </identifiers>
      <links>
        <link url="http://arxiv.org/abs/1807.10779" access="open" type="preprint"/>
        <link url="http://adsabs.harvard.edu/abs/2018arXiv180710779B" type="ADSlink"/>
      </links>
    </metadata>
    <metadata origin="ADS metadata" type="relations" primary="False" alternate_journal="False">
      <alternates/>
      <identifiers>
        <identifier type="ascl">ascl:1802.007</identifier>
      </identifiers>
      <links>
        <link url="http://ascl.net/1802.007" access="open" type="electr"/>
        <link url="http://adsabs.harvard.edu/abs/2018ascl.soft02007G" type="ADSlink"/>
      </links>
    </metadata>
    <metadata origin="ADS metadata" type="relations" primary="False" alternate_journal="False">
      <alternates>
        <alternate type="eprint">2002quant.ph..6057F</alternate>
      </alternates>
      <identifiers>
        <identifier bibcode="2002quant.ph..6057F">arXiv:quant-ph/0206057</identifier>
      </identifiers>
      <links>
        <link url="http://arxiv.org/abs/quant-ph/0206057" access="open" type="preprint"/>
        <link url="https://doi.org/10.1016%2F0550-3213%2868%2990170-3" type="electr"/>
        <link url="http://adsabs.harvard.edu/abs/1968NuPhB...7...79F" type="ADSlink"/>
      </links>
    </metadata>
golnazads commented 5 years ago

aaccomazzi [9:17 AM] @golnazads it looks like we are not getting ASCL identifiers properly populated in SOLR. Here is a recent ASCL record which is lacking ascl:XXX in its identifier SOLR field: 2018ascl.soft12002J

While adspy properly generates the <identifiers> structure, I think the problem is that we don't properly populate the needed data in ADSimportpipeline. Part of the code is there: https://github.com/adsabs/ADSImportPipeline/blob/master/aip/classic/solr_adapter.py#L395, but what is missing is the python code which creates the data, something similar to what we are doing for alternates here: https://github.com/adsabs/ADSImportPipeline/blob/master/aip/classic/enforce_schema.py#L384