Filter query does not quite work always as expected.
I caught this when run "cpath2-cli.sh -creadte-downloads"
command and saw that it skips generating the BIOPAX archive for
"Database of Interacting Proteins" (DIP) due to the search for all
pathways, interactions, complexes did not return no hits.
Then I tried a similar search queries via a web browser:
http://www.pathwaycommons.org/pc2/search?q=*&type=interaction&datasource=databas
e%20of%20interacting%20proteins (NO result)
but
http://www.pathwaycommons.org/pc2/search?q=*&type=interaction&datasource=Databas
e%20Interacting%20Proteins (i.e., if "of" is removed, it works OK)
http://www.pathwaycommons.org/pc2/search?q=*&type=interaction&datasource=dip
(OK, same as above result)
The bug is in the very specific implementation of the filters (in
SearchEngine), which creates a complex boolean query
(https://code.google.com/p/pathway-commons/source/browse/cpath-impl/src/main/jav
a/cpath/service/SearchEngine.java#715) from all the filter values, which in our
trivial example (one data source name, no organisms) becomes: ("database" AND
"of" AND "interacting" AND "proteins"). But the "of", like all prepositions,
articles, etc., was ignored by the indexer (that uses StandartAnalyzer), and
never matches...
Possible solution: use the same Lucene Analyzer (StandardAnalyzer) to get
tokens and build the boolean partial quers; the Analyzer would give three
tokens: "database", "interacting", "proteins" (skipping "of", and turning
words to lowercase).
Original issue reported on code.google.com by rod...@gmail.com on 6 Mar 2015 at 4:34
Original issue reported on code.google.com by
rod...@gmail.com
on 6 Mar 2015 at 4:34