ElsevierDev / elsapy

A Python module for use with Elsevier's APIs: Scopus, ScienceDirect, others.
http://api.elsevier.com
BSD 3-Clause "New" or "Revised" License
355 stars 140 forks source link

Extracting reference list #22

Open noenovels opened 6 years ago

noenovels commented 6 years ago

Hello, I am trying to extract reference list of known papers, using scopus ids. I tested a script few weeks ago and it seemed to work fine. If I run it now, I get this error:

Traceback (most recent call last): File "scp_id_to_reference_list.py", line 60, in ref_list = scp_doc.data['item']['bibrecord']['tail']['bibliography']['reference'] KeyError: 'item'

The code looks like this:

`

! /usr/bin/env python3

from elsapy.elsclient import ElsClient
from elsapy.elsprofile import ElsAuthor, ElsAffil
from elsapy.elsdoc import FullDoc, AbsDoc
from elsapy.elssearch import ElsSearch
import json

# Load configuration
con_file = open("config.json")
config = json.load(con_file)
con_file.close()

scp_id_list = ### list of ids

client = ElsClient(config['apikey'])

ref_list = list()

for scp_id_use in scp_id_list:
    scp_doc = AbsDoc(scp_id = scp_id_use)
    if scp_doc.read(client):
        print ("scp_doc.title: ", scp_doc.title)
        scp_doc.write()   
    else:
        print ("Read document failed.")

    # get the reference list of this article
    ref_list = scp_doc.data['item']['bibrecord']['tail']['bibliography']['reference']

    ref_list_json = list()

    # save the reference list as a json file
    for i in range(0, len(ref_list)):
        ref_info = ref_list[i]['ref-info']
        ref_json = json.dumps(ref_info)
        ref_list_json.append(json.loads(ref_json))

    ref_list[scp_id_use] = ref_list_json

` Any idea on what might be wrong? Thanks, Noemi

m-direnzo commented 6 years ago

Hi Noemi,

The key error suggests the API response is unexpected, e.g. an HTTP status error rather than the abstract data. This can occur for a number of reasons, including but not limited to 404s, quota exceeded, etc. I say this because the API responses are structured such that the 'item' key should always be present in a record, even if the value is null.

I suggest directly examining the trouble ID's JSON response. You can also put your ref_list assignment in a try block so that you can handle the exception (say 'pass' the loop) rather than have the program fail.

Hope that helps!

-Matt

noenovels commented 6 years ago

Thank you Matt. That helped and I changed the code, but I'm still not extracting any data and I'm not sure what is wrong. Here's the code:

from elsapy.elsclient import ElsClient
from elsapy.elsprofile import ElsAuthor, ElsAffil
from elsapy.elsdoc import FullDoc, AbsDoc
from elsapy.elssearch import ElsSearch
import json
import io
import numpy as np

con_file = open("config.json")
config = json.load(con_file)
con_file.close()

scp_id_list = np.loadtxt("scopus_id.txt", comments="#", delimiter=",", unpack=False, dtype=str)

client = ElsClient(config['apikey'])

ref_list = list()

for scp_id_use in scp_id_list:
    scp_doc = AbsDoc(scp_id = scp_id_use)
    if scp_doc.read(client):
        print ("scp_doc.title: ", scp_doc.title)
        scp_doc.write()   
    else:
        print ("Read document failed.")

try:
    ref_list = scp_doc.data['item']['bibrecord']['tail']['bibliography']['reference']
except:
    pass
    print(scp_id_use, ': Record not found.')

ref_list_json = list()

try:
    for i in range(0, len(ref_list)):
        ref_info = ref_list[i]['ref-info']
        ref_json = json.dumps(ref_info)
        ref_list_json.append(json.loads(ref_json))
except: 
    pass 
    print(scp_id_use, ": Record not found 2")

with open('ref_list.json', 'w') as outfile:
    json.dump(ref_list_json, outfile)

Thank you, Noemi

kno10 commented 4 years ago

I also fail to get references, for example for this DOI: https://doi.org/10.1016/j.jvcir.2018.08.005 Yes, this is a retracted article. I would like to include it in my analysis: https://www.vitavonni.de/blog/201906/2019061501-chinese-citation-factory.html This particular DOI (as well a few more) is of interest, because it appears to be part of an ongoing and big manipulation of the editorial process, both at Elsevier, IEEE, and Springer. I can easily get the citation data for Springer using the excellent APIs of semanticscholar.org and crossref.org, but not for these Elsevier articles.

Elsevier is doing a pretty poor job as publisher. Certain things must not happen with a publisher that takes its responsibilities serious... consider this article: https://www.sciencedirect.com/science/article/pii/S1047320319300999 Authors: Yiyang Yao , Peizhen Liu , Xiaowei Sun , Luming Zhang According to pubpeer.com, this paper has been published the three times before... https://pubpeer.com/publications/80D482B95206C78411522115D1718C

Or consider this article: https://www.sciencedirect.com/science/article/pii/S1047320319300744 Authors: Yanxiang Chen, Huadong Tan, Luming Zhang, Jie Zhou, Qiang Lu "This paper has been recommended for acceptance by Luming Zhang." Hello Elsevier: since when are authors allowed to recommend their own paper for acceptance?!? This should raise a "red flag", and you really need to investigate your editorial process at this journal, in particular for the special issue on TIUSM.

But without Elsevier making references publicly accessible via an API, we cannot mine for such citation patterns that are characteristic for such manipulations, because one key "value" of bad papers is boosting the h-index.

Skogur commented 1 year ago

important stuff here. bummer there hasn't been more on this.