brunoamaral / gregory-ai

Artificial Intelligence and Machine Learning to help find scientific research and filter relevant content
https://gregory-ai.com/
Other
45 stars 6 forks source link

Add author countries to the database #186

Closed brunoamaral closed 11 months ago

brunoamaral commented 2 years ago

The goal is to be able to know the affiliation of the author at the time of publishing the paper.

antoniolopes commented 2 years ago

Unfortunately, affiliation data is rarely found in public and free APIs (CrossRef and Datacite APIs have in some records but it's the minority), so subscription-based APIs (like Scopus) have to be used. Some institutions may already subscribe Scopus for other information needs, so this may be used for that search.

antoniolopes commented 2 years ago

Example of Scopus API usage for retrieving affiliation data:

import requests
import json

DOI = "10.4304/jnw.3.2.38-47"
BASE_URL = "https://api.elsevier.com/content/search/scopus"
QARG_QUERY = "query"
QARG_VIEW = "view"

payload={}
headers = {
    'X-ELS-APIKEY': '-----------------',
    'X-ELS-INSTTOKEN': '-------------------',
    'X-ELS-ResourceVersion': 'XOCS',
    'Accept': 'application/json'
}

url = BASE_URL + "?" + QARG_QUERY + "=DOI(" + DOI + ")" + "&" + QARG_VIEW + "=COMPLETE"

response = requests.request("GET", url, headers=headers, data=payload)

response_info = json.loads(response.text)
for affiliation in response_info["search-results"]["entry"][0]["affiliation"]:
    print("Affiliation: " + affiliation["affilname"] + "; " + affiliation["affiliation-city"] + "; " + affiliation["affiliation-country"])
brunoamaral commented 2 years ago

I was looking at the information we get from CrossRef and it includes the affiliation, but not the country.

"author": [
    {
        "given": "H",
        "family": "Nyland",
        "sequence": "first",
        "affiliation": [
            {
                "name": "Departments of Neurology"
            }
        ]
    },
    {
        "given": "K-M",
        "family": "Myhr",
        "sequence": "additional",
        "affiliation": [
            {
                "name": "Departments of Neurology"
            }
        ]
    },

One option would be to store a json string with author + affiliation. diagram

full json

{
    "indexed": {
        "date-parts": [
            [
                2022,
                3,
                31
            ]
        ],
        "date-time": "2022-03-31T02:59:44Z",
        "timestamp": 1648695584079
    },
    "reference-count": 15,
    "publisher": "SAGE Publications",
    "issue": "6",
    "license": [
        {
            "start": {
                "date-parts": [
                    [
                        1996,
                        6,
                        1
                    ]
                ],
                "date-time": "1996-06-01T00:00:00Z",
                "timestamp": 833587200000
            },
            "content-version": "tdm",
            "delay-in-days": 0,
            "URL": "http://journals.sagepub.com/page/policies/text-and-data-mining-license"
        }
    ],
    "content-domain": {
        "domain": [],
        "crossmark-restriction": False
    },
    "short-container-title": [
        "Mult Scler"
    ],
    "published-print": {
        "date-parts": [
            [
                1996,
                6
            ]
        ]
    },
    "abstract": "<jats:p> A multicentre, randomised, double-blind, placebo controlled study to evaluate the efficacy and safety of 4.5 and 9.0 MIU recombinant human interferon alfa-2a (Rof eron-A™) given thrice weekly in patients with relapsing-remittent multiple sclerosis is described. The patients are treated for 6 months followed by a 6 months drug-free period. The primary objective is to determine new disease activity analysed by monthly MRI with gadodiamide (GdDTPA-BMA, Omniscan™). The study is conducted at eight centers in Norway and is completed in January 1996. </jats:p>",
    "DOI": "10.1177/135245859600100618",
    "type": "journal-article",
    "created": {
        "date-parts": [
            [
                2017,
                1,
                17
            ]
        ],
        "date-time": "2017-01-17T04:43:17Z",
        "timestamp": 1484628197000
    },
    "page": "372-375",
    "source": "Crossref",
    "is-referenced-by-count": 5,
    "title": [
        "Treatment of relapsing-remittent multiple sclerosis with recombinant human interferon-alfa-2a: design of a randomised, placebo-controlled, double blind trial in Norway"
    ],
    "prefix": "10.1177",
    "volume": "1",
    "author": [
        {
            "given": "H",
            "family": "Nyland",
            "sequence": "first",
            "affiliation": [
                {
                    "name": "Departments of Neurology"
                }
            ]
        },
        {
            "given": "K-M",
            "family": "Myhr",
            "sequence": "additional",
            "affiliation": [
                {
                    "name": "Departments of Neurology"
                }
            ]
        },
        {
            "given": "F",
            "family": "Lillås",
            "sequence": "additional",
            "affiliation": [
                {
                    "name": "Department of Radiology, Aker Hospital, Oslo"
                }
            ]
        },
        {
            "given": "AI",
            "family": "Smievoll",
            "sequence": "additional",
            "affiliation": [
                {
                    "name": "Departments of Radiology"
                }
            ]
        },
        {
            "given": "T",
            "family": "Riise",
            "sequence": "additional",
            "affiliation": [
                {
                    "name": "Public Health and Primary Health Care, University of Bergen"
                }
            ]
        },
        {
            "given": "M",
            "family": "Nortvedt",
            "sequence": "additional",
            "affiliation": [
                {
                    "name": "Department of Nurse Education, Bergen College, Bergen"
                }
            ]
        },
        {
            "given": "R",
            "family": "Nilsen",
            "sequence": "additional",
            "affiliation": [
                {
                    "name": "Roche Norway, Oslo, Norway"
                }
            ]
        }
    ],
    "member": "179",
    "published-online": {
        "date-parts": [
            [
                2016,
                7,
                2
            ]
        ]
    },
    "reference": [
        {
            "key": "atypb1",
            "first-page": "S89",
            "volume": "14",
            "author": "Spielberger RT",
            "year": "1994",
            "journal-title": "Leuk Lymphoma"
        },
        {
            "key": "atypb2",
            "doi-asserted-by": "publisher",
            "DOI": "10.1055/s-2007-1007265"
        },
        {
            "key": "atypb3",
            "doi-asserted-by": "publisher",
            "DOI": "10.1016/0165-5728(95)00101-7"
        },
        {
            "key": "atypb4",
            "doi-asserted-by": "publisher",
            "DOI": "10.1212/WNL.34.10.1273"
        },
        {
            "key": "atypb5",
            "doi-asserted-by": "publisher",
            "DOI": "10.1001/archneur.1986.00520120023011"
        },
        {
            "key": "atypb6",
            "doi-asserted-by": "publisher",
            "DOI": "10.1136/jnnp.52.5.566"
        },
        {
            "key": "atypb7",
            "doi-asserted-by": "publisher",
            "DOI": "10.1212/WNL.40.3_Part_1.479"
        },
        {
            "key": "atypb8",
            "doi-asserted-by": "publisher",
            "DOI": "10.1111/j.1600-0404.1993.tb04136.x"
        },
        {
            "key": "atypb9",
            "doi-asserted-by": "publisher",
            "DOI": "10.1212/WNL.44.3_Part_1.406"
        },
        {
            "key": "atypb10",
            "doi-asserted-by": "publisher",
            "DOI": "10.1136/jnnp.54.8.683"
        },
        {
            "key": "atypb11",
            "doi-asserted-by": "publisher",
            "DOI": "10.1212/WNL.33.11.1444"
        },
        {
            "key": "atypb12",
            "doi-asserted-by": "publisher",
            "DOI": "10.1212/WNL.34.10.1368"
        },
        {
            "key": "atypb13",
            "volume-title": "Health Status Questionnaire 2.0: Scoring comparisons and reference data.",
            "author": "Radosevich DM",
            "year": "1994"
        },
        {
            "key": "atypb15",
            "doi-asserted-by": "publisher",
            "DOI": "10.1016/0022-510X(94)90045-0"
        },
        {
            "key": "atypb16",
            "doi-asserted-by": "publisher",
            "DOI": "10.1002/ana.410130302"
        }
    ],
    "container-title": [
        "Multiple Sclerosis Journal"
    ],
    "original-title": [],
    "language": "en",
    "link": [
        {
            "URL": "http://journals.sagepub.com/doi/pdf/10.1177/135245859600100618",
            "content-type": "application/pdf",
            "content-version": "vor",
            "intended-application": "text-mining"
        },
        {
            "URL": "http://journals.sagepub.com/doi/pdf/10.1177/135245859600100618",
            "content-type": "unspecified",
            "content-version": "vor",
            "intended-application": "similarity-checking"
        }
    ],
    "deposited": {
        "date-parts": [
            [
                2021,
                3,
                25
            ]
        ],
        "date-time": "2021-03-25T09:59:41Z",
        "timestamp": 1616666381000
    },
    "score": 1,
    "resource": {
        "primary": {
            "URL": "http://journals.sagepub.com/doi/10.1177/135245859600100618"
        }
    },
    "subtitle": [],
    "short-title": [],
    "issued": {
        "date-parts": [
            [
                1996,
                6
            ]
        ]
    },
    "references-count": 15,
    "journal-issue": {
        "issue": "6",
        "published-print": {
            "date-parts": [
                [
                    1996,
                    6
                ]
            ]
        }
    },
    "alternative-id": [
        "10.1177/135245859600100618"
    ],
    "URL": "http://dx.doi.org/10.1177/135245859600100618",
    "relation": {},
    "ISSN": [
        "1352-4585",
        "1477-0970"
    ],
    "issn-type": [
        {
            "value": "1352-4585",
            "type": "print"
        },
        {
            "value": "1477-0970",
            "type": "electronic"
        }
    ],
    "subject": [
        "Neurology (clinical)",
        "Neurology"
    ],
    "published": {
        "date-parts": [
            [
                1996,
                6
            ]
        ]
    }
}
antoniolopes commented 2 years ago

Again, you need subscriber-based APIs to have that kind of information. Also, authors can have multiple affiliations, so the data model should consider a many-to-many relationship between authors and institutions (for affiliations)

brunoamaral commented 11 months ago

Closing this issue, right now we have the country information from ORCID. Other APIs require a subscription and we don't have the means for it.