Open eltermann opened 7 years ago
Hi -
Q1. I don't know the percentages ( @ckoscher ? ) but over the next few weeks you will see the number of references with DOIs increase quite significantly, as we are running through the backfile and inserting them into the JSON. The presence of the DOI should, I would think, make for more accurate analysis?
Q2. As you note, not all DOIs have references. The 60% are publishers who have opted not to make their references public, so you will not be able to access them through the API. Then there are publishers who don't deposit references at all, who account for much of the additional ~50m DOIs.
Q3. Yes.
Q4. Asking @kjw to comment on the cursor paging part of the question. Cited-by is only available at present to publishers who deposit references with Crossref (it's a reciprocal thing)
@kjw & @kmeddings: I have 1 question in similar lines of @eltermann
As of June 2017, the fraction of publications with open references has grown from 1% to more than 45% out of the nearly 35 million articles with references deposited with Crossref (to date).
I did some validation and analysis on the data to understand Crossref coverage but one thing I came across was surprising.
So I wanted to know why is this information missing, did something went wrong with API while I was extracting information. Or this information is not available with Crossref at all or is Crossref in process of updating this information and will be seen to user after certain time period.
Thanks for sharing your analysis, @mjmehta15. The references which publishers have made publicly available have been indexed and thus should be contained in the REST API. So that we can look into this further, could you please provide examples of records you've found which ought to have references, but were found to be missing them?
Question 1: For a given reference some publishers deposit the DOI along with metadata for the reference. Some do not, for these Crossref attempts to matching to fill in the DOI.
In the metadata it says if Crossref put the DOI there. reference": [
{
"key": "BIB1",
"author": "Blaustein",
"volume": "11",
"first-page": "438",
"year": "1989",
"journal-title": "Trends Neurosci",
"DOI": "10.1016/0166-2236(88)90195-6",
"doi-asserted-by": "crossref"
},
692,755,119 references do have a DOI out of 1.017 billion
@jenniferlin15: Thanks for quick reply. Here are few examples which will provide more clarity on issue of difference between "reference" list not matching to "reference-count". Example 1: "reference-count" and "reference" present but actual reference list missing
Example 2: "reference-count" and "reference" present but actual reference list has less number of documents
Example 3: "reference-count" present but reference key is missing
Total records for Example 1 & 2: ~137,98,787 and Total records for Example 3: ~14932109.
So can you please explain there is such discrepancy?
Would love to hear an update regarding the reference count vs reference list question.
Didn't want to unassign anybody here... happened automatically when I posted the comment.
Example 1: "reference-count" and "reference" present but actual reference list missing DOI: 10.1155/2007/90401 - "reference-count":12, reference: [ ] References deposited, but not showing up in REST API result. (http://doi.crossref.org/search/doi?pid=jlin@crossref.org&format=unixsd&doi=10.1155/2007/90401) This is a bug, and we'll need to investigate further.
Example 2: DOI http://api.crossref.org/works/10.1155/2008/217373 All references in XML are found in REST API (33) No issue found here.
DOI 10.1155/2007/87136 All references in XML are found in REST API (17) No issue found here.
Example 3: DOI http://api.crossref.org/works/10.12737/25170 These references are private and the publisher, Infra-M Academic Publishing House, has not chosen to distribute them. No issue found here.
DOI 10.1016/j.neurol.2017.03.026 These references are private and the publisher, Elsevier, has not chosen to distribute them. No issue found here.
Example 1 issue may be related to this ticket: #379
First of all, thanks for the great job you maintainers are doing.
Second, a little context: I'm building a citations database for research purposes and decided to use Crossref as main data source.
That said, the present issue is actually a compilation of questions.
Question 1
Some references present DOI while others doesn't. I need to answer the following question:
Approximately how many references (in %) have DOI? Each reference DOI is the exact value sent by the publisher or does Crossref perform any name matching to populate this field?
Question 2
I4OC page [1] states that:
I found that the "work.has-references" filter probably corresponds to the 35 million mentioned (because of the response of [2]). I also noticed some works published by members with has-public-references:true doesn't have references with DOI's. What are the differences between the ~40% with open references compared to the other ~60%?
Question 3
As far as I understand, for a given member with has-public-references:true, individual prefixes may or may not publish open references. Is it correct to assume these are the prefixes with public-references:true in calls like [3]?
Question 4
My plan is to obtain all the records with useful data regarding citations. I'm planning to filter works (probably by member or prefix) and use cursor for paging. Are there any pitfalls/alternatives you'd point out? Would it be possible to use the cited-by service for this purpose?
--
Felipe
[1] https://i4oc.org
[2] http://api.crossref.org/works?filter=has-references:true&rows=0
[3] http://api.crossref.org/members/311