Open kltm opened 10 years ago
Ugh. It is looking like the "go" bits now. For example, take:
http://amigo2.berkeleybop.org/amigo/gene_product/RGD:1308769
any search starting with "go:" removes all annotations. Looking at:
http://amigo.geneontology.org/amigo/gene_product/FB:FBgn0000535
all attempts at filtering with "go:" string fail--nothing is filtered.
Direct response with debug at:
Quoting the string prevents this. As well, you can see in the results that it is tokenizing on the colon.
This seems to be the same issue as #93. However, since the explanation is cleared here, I'm going to mark the earlier one as a dupe (although is should be read to get more background).
The current takeaway is that this is an issue and that there are a fair number of colon related issues in Solr, and it is probably not worth ripping up the plumbing right before we switch to Solr 4.x (which may have fixed this case or have slightly different issues, see: kltm/bbop-js#16).
The current workaround for this is that in the case of ID search in free text (which was considered a marginal case initially, but not now), one can use quotes to force the correct behaviour.
kltm/bbop-js#16
Can we not have the API intercept these queries and auto-quote them?
So specifically, to propose a possible fix, you might add something to the consumer search function (https://kltm.github.io/bbop-js/docs/files/golr/manager-js.html#bbop.golr.manager.set_comfy_query): any "token" that had a colon in it would be not further split downstream by being automatically quoted at this stage. I'm not wild about this approach here, mainly because there seem to be 1) actual problems with what our version of Solr is doing with the colons and 2) I believe the tokenizer we're using for searchables is eliminating them anyways. I'm not immediately sure how to work around these except for revisiting from the backend up. For example, take:
you can see that it /mostly/ removed the colon from existence in the parsed query, meaning that there is certainly no match (this would likely be due to the search tokenizer we're using on the Solr end for "_searchable"s). Trying a couple of ways to url encode that ahead of time doesn't help, and gets the parsed query even weirder; moreover, even if you could, I don't believe anything would match anyways.
I think the easiest approach would be to switch to the better fixed 4.6 and take out a lot of these super annoying search issues in the process.
Also from http://jira.geneontology.org/browse/GO-1428
Seems odd that this search returns no result: http://amigo.geneontology.org/amigo/medial_search?q=S000000031 would have expected it to return this entry: http://amigo.geneontology.org/amigo/gene_product/SGD:S000000031
This is the expected behavior given the tokenizing issue. Now that the work has been done for the new tokenizing with GOlr in the monarch stack, we just need to port it over to AmiGO by updating bbop-manager-golr.
We're running into this again, see https://github.com/geneontology/helpdesk/issues/99
This is really key, people really expect to be able to search with the non-prefixed part of the ID. Do we still need to change bbop-manager-golr? Isn't this just a matter of adding the unprefixed form as something solr searches on?
hi, any news on this? we really would like to do some analysis for a paper that we would like to submit ASAP @ValWood
@Antonialock This isn't our primary issue. this is a side issue.
This is the trackable issue for:
http://jira.geneontology.org/browse/GO-624
The current statement of the issue is:
This is considered an issue because it seems unlikely this gene would have the seen associations with GO:0007072.
Possibilities to consider are something wrong with the search such the the "go" bits in the synonyms are matching (although why so few then) or there is a hiccup in the ontology and these really are in the closure.