Closed OBrink closed 3 years ago
The difference between your approach and the API, is that, the API uses some other parameters in the query to allow users to download all the documents related to the given query.
In both approaches there is a total of 290890 matched documents. You can see it testing both urls, and looking the attribute total-results.
API: https://api.crossref.org/works?query=Albert+Einstein+Elektrodynamik+bewegter+K%C3%B6rper&cursor=%2A&rows=100 Your approach: https://api.crossref.org/works?query=Albert+Einstein+Elektrodynamik+bewegter+K%C3%B6rper
As you can see, the differences between the urls are the parameters (rows=100 and cursor=*) where :
I've included a question it the Crossref API repository: https://github.com/CrossRef/rest-api-doc/issues/557
Thank you for the quick reply! For now, I will keep working without the cursor parameter in my requests.
Hey @OBrink,
As rightly pointed by @fabiobatalha the API applies cursor=* in the url leading to change in order.
You can achieve desired result by applying .sort("relevance")
as following:
from crossref.restful import Works
keyword = 'Albert Einstein Elektrodynamik bewegter Körper'
works = Works()
result = works.query(keyword).sort("relevance")
for entry in result:
print(entry)
break
>> {'indexed': {'date-parts': [[2020, 5, 25]], 'date-time': '2020-05-25T14:23:45Z', 'timestamp': 1590416625775}, 'publisher-location': 'Wiesbaden', 'reference-count': 0, 'publisher': 'Vieweg+Teubner Verlag', 'isbn-type': [{'value': '9783663193722', 'type': 'print'}, {'value': '9783663195108', 'type': 'electronic'}], 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'published-print': {'date-parts': [[1923]]}, 'DOI': '10.1007/978-3-663-19510-8_3', 'type': 'book-chapter', 'created': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:43Z', 'timestamp': 1386295723000}, 'page': '26-50', 'source': 'Crossref', 'is-referenced-by-count': 5, 'title': ['Zur Elektrodynamik bewegter Körper'], 'prefix': '10.1007', 'author': [{'given': 'A.', 'family': 'Einstein', 'sequence': 'first', 'affiliation': []}], 'member': '297', 'container-title': ['Das Relativitätsprinzip'], 'link': [{'URL': 'http://link.springer.com/content/pdf/10.1007/978-3-663-19510-8_3', 'content-type': 'unspecified', 'content-version': 'vor', 'intended-application': 'similarity-checking'}], 'deposited': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:45Z', 'timestamp': 1386295725000}, 'score': 53.646687, 'issued': {'date-parts': [[1923]]}, 'ISBN': ['9783663193722', '9783663195108'], 'references-count': 0, 'URL': 'http://dx.doi.org/10.1007/978-3-663-19510-8_3'}
I hope that serves your purpose.
Thanks, Ankush
@Ankush-Chander Thank you very much! That helps me getting exactly what I need.
When trying to retrieve information via simple queries, I consistently got outputs that I did not expect. Specifically, the publications which are referred to by the keywords are not returned in the result of the query. I do however get a return with the right publication data via a manual HTTP GET request.
Example code:
I get this kind of output which has nothing to do with my input keyword with different keywords, too. I have tried modifying the order of the result [result.order('desc')] but that does not seem to change anything.
When I then do the same request via HTTP GET and the normal API URL, I get the expected output as the first result:
The output that I have retrieved with the tool in this repository has nothing to do with my query keyword. Do you have an idea about how I can fix this? I would be very grateful for every kind of help.