CrossRef / rest-api-doc

Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/
Other
742 stars 269 forks source link

Question - search on metadata to retrieve DOI #254

Closed prcollingwood closed 6 years ago

prcollingwood commented 7 years ago

I am trying to retrieve the DOI searching on various metadata such as title, author, year, issn etc. I am having a lot of trouble narrowing it down to one result. I keep getting a list of works. I think it's because query.title is not exact, instead OR is placed between each word in the title leading to an enormous works list. Is there a way to search for an exact title? But I have also have an example that doesn't make sense. I search for a DOI which works, and gives me the correct metadata: https://api.crossref.org/works/10.1016/B978-0-12-810523-8.00019-7

But when I reverse engineer this and search on the title, year, author etc, the item is not found. I get a list of works, but the one that I'm looking for is not in the list: https://api.crossref.org/journals/1556-5068/works?query.title=Design+of+a+National+River+Health+Assessment+Program+for+China&query.author=Gippel+Chris&filter=from-created-date:2017,until-created-date:2017

Can you give me some guidance/explanation of what I'm doing wrong?

Thanks, Petrina

kjw commented 7 years ago

I see the problem. You are making a search on articles that appear in a journal with ISSN 1556-5068 (in the path: /journals/1556-5068/journals). Problem is that our metadata doesn't have this DOI as being part of that journal. In fact, the DOI is listed as a book-chapter, so I'm pretty sure that in reality it is not part of a journal. This URL removes the journal-specifc part of the query and does return results as you would expect:

https://api.crossref.org/works?query.title=Design+of+a+National+River+Health+Assessment+Program+for+China&query.author=Gippel+Chris&filter=from-created-date:2017,until-created-date:2017

On multiple results - that's just the way the API works. If you are making a very specific query, the top result is the most accurate match we can find. Problems may arise if you attempt to search for something that is not in our database. The API will still try to find something that matches. In this case, you may have a problem of identifying false positives. How to solve that really depends on what happens with the data after the query.

Hope this helps, please follow up if desired (I'll close this issue only so I know that it has been answered in some way.)

prcollingwood commented 7 years ago

Sorry, I forgot to modify that part of the url when I was experimenting - I was tired. Do you have any suggestions about how to figure out if it's a false positive? I want to retrieve DOIs and use them to search oaDOI for open access information.

prcollingwood commented 7 years ago

If say I only use the API for journal articles with an ISSN, if I check that the ISSN exists in the database, then tried to find the article for those with articles an ISSN in the database, would that be a fairly accurate way of getting correct matches?

jenniferlin15 commented 7 years ago

reopening issue to address new questions added, @kjw

jenniferlin15 commented 6 years ago

Yes, you've nailed it. Honing in on the subset of records with ISSN will do it.