PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Filter does not quite work: no results found using "Database of Interacting Proteins " as 'datasource' filter value #203

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Filter query does not quite work always as expected.
I caught this when run "cpath2-cli.sh -creadte-downloads" 
command and saw that it skips generating the BIOPAX archive for
"Database of Interacting Proteins" (DIP) due to the search for all 
pathways, interactions, complexes did not return no hits.

Then I tried a similar search queries via a web browser:

http://www.pathwaycommons.org/pc2/search?q=*&type=interaction&datasource=databas
e%20of%20interacting%20proteins (NO result)

but

http://www.pathwaycommons.org/pc2/search?q=*&type=interaction&datasource=Databas
e%20Interacting%20Proteins (i.e., if "of" is removed, it works OK)
http://www.pathwaycommons.org/pc2/search?q=*&type=interaction&datasource=dip 
(OK, same as above result)

The bug is in the very specific implementation of the filters (in 
SearchEngine), which creates a complex boolean query 
(https://code.google.com/p/pathway-commons/source/browse/cpath-impl/src/main/jav
a/cpath/service/SearchEngine.java#715) from all the filter values, which in our 
trivial example (one data source name, no organisms) becomes:  ("database" AND 
"of" AND "interacting" AND "proteins"). But the "of", like all prepositions, 
articles, etc., was ignored by the indexer (that uses StandartAnalyzer), and 
never matches...

Possible solution: use the same Lucene Analyzer (StandardAnalyzer) to get 
tokens and build the boolean partial quers; the Analyzer would give three 
tokens:  "database",  "interacting", "proteins" (skipping "of", and turning 
words to lowercase).

Original issue reported on code.google.com by rod...@gmail.com on 6 Mar 2015 at 4:34

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 7dbc1338d3f9.

Original comment by rod...@gmail.com on 7 Mar 2015 at 12:35