Closed jonnybazookatone closed 7 years ago
This would be a great improvement to the client!
Are there any other search terms (i.e., ones that aren't currently accessible from the client) that also have this kind of data structure returned? Accessing the results from SolrQuery
rather than Article
makes sense from a backend perspective, but if there are other fields that have a similar data structure then it might seem unusual to suggest "use the Article
to access all attributes, except if you want X
, Y
, Z
-- then access them from the SolrQuery
and match up by Article.id
".
An alternative scenario might be to have SearchQuery
attach any highlights to the Article
s as they are created and have a getter/setter for Article.highlights
.
class SolrResponse(APIResponse):
"""
Base class for storing a solr response
"""
...
@property
def articles(self):
"""
articles getter
"""
if self._articles is None:
self._articles = []
for doc in self.docs:
# ensure all fields in the "fl" are in the doc to address
# issue #38
for k in set(self.fl).difference(doc.keys()):
doc[k] = None
article = Article(**doc)
article._highlights = self.response['highlights'].get(article.id, None)
self._articles.append(article)
return self._articles
I don't have a strong opinion on the best way it should be handled -- I was only trying to see if there were similar ways to implement it.
@jonnybazookatone just to clarify: is the current server API capable of sending back the highlighted information? (e.g., would it be able to send back the information so that it is currently accessible by the client even through .response.response.json()
)
There are hacks occurring at #dotastro which would benefit from this if it were currently accessible from the server side.
Yes, it's currently available from the API. For example:
curl -H 'Authorization: Bearer:TOKEN' 'https://api.adsabs.harvard.edu/v1/search/query?q=star&fl=id&hl=true&hl.fl=title,abstract' | python -m json.tool
{
"highlighting": {
"1732456": {
"abstract": [
" in the early universe or in the ultra-dense core of neutron <em>stars.</em> The thermal radiation
from the quarks"
]
},
...
"response": {
"docs": [
{
"id": "1732456"
},
...
"numFound": 943061,
"start": 0
},
"responseHeader": {
"QTime": 252,
"params": {
"fl": "id",
"hl": "true",
"hl.fl": "title,abstract",
"q": "star",
"wt": "json"
},
"status": 0
}
}
To clarify a little: you need to pass hl=true
to turn on highlights. Then you can pass hl.fl
which are the highlight fields, the most useful being hl.fl=title,abstract,body
. You will then see the response contains the snippet that resulted in this document being returned, by the world being surrounded by <em>word</em>
.
If you have further questions, open a ticket in the ADS issues, otherwise this issues's gonna get to long :stuck_out_tongue:.
I ended up using the initial approach due to time constraints, but I'm happy if it's replaced by your other suggestion. I'll close this ticket for now.
@aaccomazzi you may want to look at #90.
Looks good Jonny. FYI, there are more fields where highlights are supported. Some other useful ones are ack
, aff
and author
(which allows one to find where in the list is the author you have been looking for).
Good to know, I'll add those also.
The search engine has the capability of returning highlighted pieces of text for searches, for example:
when requested, Solr will return the relevant highlighted text that resulted in the document:
This form is
highlights: {"id": ["highlights requested, abstract, title, etc."]}
. There are a few users that have requested access to this.Proposed API
The highlights are query dependent, and so my first thought is to keep them connected to the
SolrQuery
class, and not within theArticle
, as then theArticle
class will have state related to its parent query, which it has no concept of. So you could foresee something as simple as:and then you would access it via the API as:
Alternative options are welcome, such as a highlights class that is filled and attached to the
SearchQuery
class., or something else smarter that retains the above prerequisites.Issues with Article class
Just as an FYI. It would be weird to have something like:
as this article class could have many highlights depending on the query was, so you'd have to keep track of query and article.