Open ba001 opened 7 years ago
@queryluke ah, wait, i think the second of those issues is not actually an issue. i think what's happening is that if a poem or work contains all the words (execution, is, the, chariot, of, genius), then all the lines with all or some of those words will get returned, which is how it should work. so it's only the first issue concerning the exact string in quotes that's a problem
I think the problem is spaces in the query not being url encoded. When I use the solr admin console to run the query, this is what the query string looks like:
q=text_contents:"execution%20is%20the%20chariot%20of%20genius"
But I don't see the conversion anywhere in the scripts. There are lots of places your can make the encoding, using the default javascript encodeURIComponent
The form is found: https://github.com/blakearchive/erdman/blob/master/client/src/components/search-form.js You could urlencode the query before it's sent to the main controller.
When you submit, the contents of the form get sent to the main controller: https://github.com/blakearchive/erdman/blob/master/client/src/erdman.controller.js#L47 You could urlencode the query here.
Line 52, runs the search, which happens in this file: https://github.com/blakearchive/erdman/blob/master/client/src/services.js#L19 Another option here
This file passes the query to the python files: https://github.com/blakearchive/erdman/blob/master/server/erdman/service.py#L25 Finally, here are the python scripts. I'd say you could do it here, but it looks like python requires an additional package, urllib, to do any url encoding. I'm pretty sure we have that package installed, so you could try it.
@queryluke hmm, when i tried using encodeURIComponent() and searching "keep it" (with quotes) i got this error on the site:
No results found for %22keep%20it%22
tried putting it in search-form.js:
onSubmit(){
console.log('submitting');
this.query = encodeURIComponent(this.query);
this.onSearch({query: this.query});
}
hmm, that probably means you'll have to tinker with the python scripts.
I know a lot about how php and ruby query solr, but I'm not familiar with python. For example, in php, if you curl something like http://localhost:8983/solr/core/select?q=title:%22aquatic%20plant%22&wt=json
it works just fine.
But it looks like this python library (pysolr) sends a literal string? I don't know, seems odd. Maybe ask nathan about it?
Thanks, Luke. @nathan-rice do you have any insight into this one?
@nathan-rice just wanted to ask you about this one again. at the moment, we can't do exact string searches on erdman
I pushed a fix for this a while ago.
@nathan-rice ok, just deployed. looks good except for one thing. search "well well". notice the results for page 15. one line there is broken into two. the resulting line in the document begins "Well well..." so the result should just show that line for page 15.
The problem here is that solr is returning multiple "highlight" snippets for that search (each only containing a single "well") - not sure why, seems like a bug to me. I could fix this case by just merging all highlight snippets for any given poem, but that would almost certainly break a ton of other cases.
@queryluke if you have time, could you have a look at these holdover issues?
if you search on erdman "execution is the chariot of genius", with double quotes, you get no results. if you search
execution is the chariot of genius
, you get a correct result, but then a bunch of other results from the same page that contain some of the wordsi think we'd like to be able to search exact strings using double quotes, and--@joeafletch correct me if i'm wrong--a hit should only get returned if all the words (unquoted) from a query match