cern-sis / issues-scoap3

0 stars 0 forks source link

Elsevier affiliations #262

Closed ErnestaP closed 6 months ago

ErnestaP commented 6 months ago

New workflows: when there is no affiliation id assigned to the author, it parses affiliation value as: {"value": None} in legacy scoap3, the group affiliation was taken: https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/s3_elsevier_parser.py#L215

example of XML. Authors do not have the affiliation id value (aff00100), so all affiliations below it were taken as their

  <ce:author-group id="ag0010">
            <ce:author orcid="0000-0002-2957-5276" id="au0010"
                author-id="S0370269323004434-957dd146f6e243df4b6f309334983b99">
                <ce:given-name>Ioannis D.</ce:given-name>
                <ce:surname>Gialamas</ce:surname>
                <ce:cross-ref **refid="cr0010"** id="crf0450">
                    <ce:sup>⁎</ce:sup>
                </ce:cross-ref>
                <ce:e-address type="email" xlink:href="mailto:ioannis.gialamas@kbfi.ee" id="ea0010">
                    ioannis.gialamas@kbfi.ee</ce:e-address>
            </ce:author>
            <ce:author id="au0020" author-id="S0370269323004434-11ce32aef4ff6906bb946b4904582471">
                <ce:given-name>Hardi</ce:given-name>
                <ce:surname>Veermäe</ce:surname>
                <ce:e-address type="email" xlink:href="mailto:hardi.veermae@cern.ch" id="ea0020">
                    hardi.veermae@cern.ch</ce:e-address>
            </ce:author>
            <ce:affiliation id="aff0010"
                affiliation-id="S0370269323004434-e0efc38af768f3818fce829f251dffc5">
                <ce:textfn>Laboratory of High Energy and Computational Physics, National Institute
                    of Chemical Physics and Biophysics, Rävala pst. 10, 10143, Tallinn, Estonia</ce:textfn>
                <sa:affiliation>
                    <sa:organization>Laboratory of High Energy and Computational Physics</sa:organization>
                    <sa:organization>National Institute of Chemical Physics and Biophysics</sa:organization>
                    <sa:address-line>Rävala pst. 10</sa:address-line>
                    <sa:city>Tallinn</sa:city>
                    <sa:postal-code>10143</sa:postal-code>
                    <sa:country>Estonia</sa:country>
                </sa:affiliation>
                <ce:source-text id="srct0005">Laboratory of High Energy and Computational Physics,
                    National Institute of Chemical Physics and Biophysics, Rävala pst. 10, 10143,
                    Tallinn, Estonia</ce:source-text>
            </ce:affiliation>
            <ce:correspondence **id="cr0010"**>
                <ce:label>⁎</ce:label>
                <ce:text>Corresponding author.</ce:text>
            </ce:correspondence>
        </ce:author-group>

parser output should look like this:

[
                    {
                        "surname": "Gialamas",
                        "given_names": "Ioannis D.",
                        "affiliations": [
                            {
                                "value": "Laboratory of High Energy and Computational Physics, National Institute of Chemical Physics and Biophysics, Rävala pst. 10, 10143, Tallinn, Estonia",
                                "organization": "Laboratory of High Energy and Computational Physics",
                                "country": "Estonia",
                            }
                        ],
                        "email": "ioannis.gialamas@kbfi.ee",
                    },
                    {
                        "surname": "Veermäe",
                        "given_names": "Hardi",
                        "affiliations": [
                            {
                                "value": "Laboratory of High Energy and Computational Physics, National Institute of Chemical Physics and Biophysics, Rävala pst. 10, 10143, Tallinn, Estonia",
                                "organization": "Laboratory of High Energy and Computational Physics",
                                "country": "Estonia",
                            }
                        ],
                        "email": "hardi.veermae@cern.ch",
                    },
                ],
            ],

example in the repo: https://repo.scoap3.org/records/79398