Open FabianEUR opened 6 months ago
Thanks for using litstudy!
I cannot see the error. The query looks fine. Does the query work if you use it on the Scopus website?
The query works on Scopus and I can find publications which can be exported. I've tried variations without quotation marks, with/without brackets, only one keyword, with/out wildcard, etc. but get the same error.
Here is the error:
Scopus400Error Traceback (most recent call last)
Cell In[2], line 7
4 import logging
5 logging.getLogger().setLevel(logging.CRITICAL)
----> 7 docs_scopus, docs_not_found = litstudy.refine_scopus(docs_scopus)
8 print(len(docs_scopus), "papers found on Scopus")
9 print(len(docs_not_found), "papers NOT found on Scopus")
File ~\AppData\Roaming\Python\Python311\site-packages\litstudy\sources\scopus.py:248, in refine_scopus(docs, search_title)
244 return ScopusDocument.from_eid(record.eid)
246 return None
--> 248 return docs._refine_docs(callback)
File ~\AppData\Roaming\Python\Python311\site-packages\litstudy\types.py:53, in DocumentSet._refine_docs(self, callback)
50 old_docs = []
52 for i, doc in enumerate(progress_bar(self.docs)):
---> 53 new_doc = callback(doc)
55 if new_doc is not None:
56 new_indices.append(i)
File ~\AppData\Roaming\Python\Python311\site-packages\litstudy\sources\scopus.py:236, in refine_scopus.<locals>.callback(doc)
234 if len(title) > 10 and search_title:
235 query = f"TITLE({title})"
--> 236 response = ScopusSearch(query, view="STANDARD", download=False)
237 nresults = response.get_results_size()
239 if nresults > 0 and nresults < 10:
File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\scopus_search.py:206, in ScopusSearch.__init__(self, query, refresh, view, verbose, download, integrity_fields, integrity_action, subscriber, **kwds)
204 self._query = query
205 self._view = view
--> 206 Search.__init__(self, query=query, api='ScopusSearch', count=count,
207 cursor=subscriber, download=download,
208 verbose=verbose, **kwds)
File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\superclasses\search.py:62, in Search.__init__(self, query, api, count, cursor, download, verbose, **kwds)
59 self._cache_file_path = get_folder(api, self._view)/stem
61 # Init
---> 62 Base.__init__(self, params=params, url=URLS[api], download=download,
63 api=api, verbose=verbose)
File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\superclasses\base.py:66, in Base.__init__(self, params, url, api, download, verbose, *args, **kwds)
64 self._json = loads(fname.read_text())
65 else:
---> 66 resp = get_content(url, api, params, *args, **kwds)
67 header = resp.headers
69 if ab_ref_retrieval:
File ~\AppData\Roaming\Python\Python311\site-packages\pybliometrics\scopus\utils\get_content.py:116, in get_content(url, api, params, **kwds)
114 except:
115 reason = ""
--> 116 raise errors[resp.status_code](reason)
117 except KeyError:
118 resp.raise_for_status()
Scopus400Error: Error translating query
--
I'm using jupyter notebook and have the same error via uni VPN and on campus.
Seems that this is a bug. It seems that litstudy tries to search Scopus for the title of the paper by using the query "TITLE({title})", but this results in an incorrect syntax for Scopus for certain titles. This will need further investigation.
However, I don't really understand the line litstudy.refine_scopus(docs_scopus)
. You have loaded documents from Scopus into docs_scopus
and then want to refine them again using Scopus? Or do you load the original documents from a file?
Ahh, maybe that explains why the refining always only works until a certain publication before the error appears.
I exported the .csv from scopus and loaded the file into docs_scopus
and then refined them. Is this only meant to be done for non-scopus datasets?
Ahh, maybe that explains why the refining always only works until a certain publication before the error appears.
Indeed. If you could figure out which publication it fails on, you can remove that one from the dataset as a temporary solution.
I exported the .csv from scopus and loaded the file into docs_scopus and then refined them. Is this only meant to be done for non-scopus datasets?
That is fine, if you load it from a CSV file it indeed makes sense to refine it afterwards. The function refine_scopus
should work on any dataset from any source. It fails here because of a bug :-(
If you would like to look into this issue, we are happy to accept pull requests!
I think what need to happen is probably that the title needs to be "stripped" from punctuation before it is passed to Scopus. For example, if the title is something like:
Research on the number of prime numbers between n² and (n+1)²
The query sent to Scopus will be:
TITLE(Research on the number of prime numbers between n² and (n+1)²)
but all those non-alphabetic characters result in query that is not accepted by Scopus.
Additionally, in the case were a Document
already has a ScopusID, we can just query Scopus directly for the publication without having to search based on the title (I think the CSV file already provides the ScopusID).
Hi,
Is it possible to refine/process publications from Scopus limited to source titles containing a specified keyword? For example, my query (and variations thereof) gives me the above error after refining a 2000+- publications:
( TITLE-ABS-KEY ( "recommend sys" OR "recommend servi" ) AND SRCTITLE ( "comput*" OR "acm" ) )
I had a look at the API reference and the existing issues, but I had some trouble finding an answer to my question.
Thank you