CrossRef / rest-api-doc

Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/
Other
745 stars 268 forks source link

Questions regarding citations #209

Open eltermann opened 7 years ago

eltermann commented 7 years ago

First of all, thanks for the great job you maintainers are doing.

Second, a little context: I'm building a citations database for research purposes and decided to use Crossref as main data source.

That said, the present issue is actually a compilation of questions.

Question 1

Some references present DOI while others doesn't. I need to answer the following question:

What's the impact of solely relying on Crossref's references' DOIs to build the citation relationships?

Approximately how many references (in %) have DOI? Each reference DOI is the exact value sent by the publisher or does Crossref perform any name matching to populate this field?

Question 2

I4OC page [1] states that:

As of March 2017, the fraction of publications with open references has grown from 1% to more than 40% out of the nearly 35 million articles with references deposited with Crossref (to date).

I found that the "work.has-references" filter probably corresponds to the 35 million mentioned (because of the response of [2]). I also noticed some works published by members with has-public-references:true doesn't have references with DOI's. What are the differences between the ~40% with open references compared to the other ~60%?

Question 3

As far as I understand, for a given member with has-public-references:true, individual prefixes may or may not publish open references. Is it correct to assume these are the prefixes with public-references:true in calls like [3]?

Question 4

My plan is to obtain all the records with useful data regarding citations. I'm planning to filter works (probably by member or prefix) and use cursor for paging. Are there any pitfalls/alternatives you'd point out? Would it be possible to use the cited-by service for this purpose?

--

Felipe

[1] https://i4oc.org

[2] http://api.crossref.org/works?filter=has-references:true&rows=0

[3] http://api.crossref.org/members/311

kmeddings commented 7 years ago

Hi -

Q1. I don't know the percentages ( @ckoscher ? ) but over the next few weeks you will see the number of references with DOIs increase quite significantly, as we are running through the backfile and inserting them into the JSON. The presence of the DOI should, I would think, make for more accurate analysis?

Q2. As you note, not all DOIs have references. The 60% are publishers who have opted not to make their references public, so you will not be able to access them through the API. Then there are publishers who don't deposit references at all, who account for much of the additional ~50m DOIs.

Q3. Yes.

Q4. Asking @kjw to comment on the cursor paging part of the question. Cited-by is only available at present to publishers who deposit references with Crossref (it's a reciprocal thing)

mjmehta15 commented 7 years ago

@kjw & @kmeddings: I have 1 question in similar lines of @eltermann

As of June 2017, the fraction of publications with open references has grown from 1% to more than 45% out of the nearly 35 million articles with references deposited with Crossref (to date).

I did some validation and analysis on the data to understand Crossref coverage but one thing I came across was surprising.

So I wanted to know why is this information missing, did something went wrong with API while I was extracting information. Or this information is not available with Crossref at all or is Crossref in process of updating this information and will be seen to user after certain time period.

jenniferlin15 commented 7 years ago

Thanks for sharing your analysis, @mjmehta15. The references which publishers have made publicly available have been indexed and thus should be contained in the REST API. So that we can look into this further, could you please provide examples of records you've found which ought to have references, but were found to be missing them?

ckoscher commented 7 years ago

Question 1: For a given reference some publishers deposit the DOI along with metadata for the reference. Some do not, for these Crossref attempts to matching to fill in the DOI.

In the metadata it says if Crossref put the DOI there. reference": [

{
    "key": "BIB1",
    "author": "Blaustein",
    "volume": "11",
    "first-page": "438",
    "year": "1989",
    "journal-title": "Trends Neurosci",
    "DOI": "10.1016/0166-2236(88)90195-6",
    "doi-asserted-by": "crossref"
},

692,755,119 references do have a DOI out of 1.017 billion

mjmehta15 commented 7 years ago

@jenniferlin15: Thanks for quick reply. Here are few examples which will provide more clarity on issue of difference between "reference" list not matching to "reference-count". Example 1: "reference-count" and "reference" present but actual reference list missing

Example 2: "reference-count" and "reference" present but actual reference list has less number of documents

Example 3: "reference-count" present but reference key is missing

Total records for Example 1 & 2: ~137,98,787 and Total records for Example 3: ~14932109.

So can you please explain there is such discrepancy?

Bubblbu commented 6 years ago

Would love to hear an update regarding the reference count vs reference list question.

Bubblbu commented 6 years ago

Didn't want to unassign anybody here... happened automatically when I posted the comment.

jenniferlin15 commented 6 years ago

Example 1: "reference-count" and "reference" present but actual reference list missing DOI: 10.1155/2007/90401 - "reference-count":12, reference: [ ] References deposited, but not showing up in REST API result. (http://doi.crossref.org/search/doi?pid=jlin@crossref.org&format=unixsd&doi=10.1155/2007/90401) This is a bug, and we'll need to investigate further.

Example 2: DOI http://api.crossref.org/works/10.1155/2008/217373 All references in XML are found in REST API (33) No issue found here.

DOI 10.1155/2007/87136 All references in XML are found in REST API (17) No issue found here.

Example 3: DOI http://api.crossref.org/works/10.12737/25170 These references are private and the publisher, Infra-M Academic Publishing House, has not chosen to distribute them. No issue found here.

DOI 10.1016/j.neurol.2017.03.026 These references are private and the publisher, Elsevier, has not chosen to distribute them. No issue found here.

jenniferlin15 commented 6 years ago

Example 1 issue may be related to this ticket: #379