PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Re-design full-text search index and methods to remove "entity/find" command ("find" will cover everything) #123

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Currently, we have two full-text search web service commands (and the 
corresponding methods in the PaxtoosDAO): "entity/find" (it became 
"find_entity" in the latest sources) and "find". The key difference between 
these two is that "entity/find" takes extra steps - in the intermediate search 
result to replace a matching Xref or EntityReference with its parent entities 
(which then must pass all the filters) and remove all other utility class hits.

I would like to request the following feature:
Get rid of "entity/find", and instead make sure an Entity always match a lucene 
query if at least one of its specific UtilityClass child elements does (e.g., 
consider xref/id, entityReference/name, entityReference/comment, 
entityReference/xref/id, etc. property paths) 

ToDo:
1. in the Premerge stage, e.g., for each Entity, to generate additional 
bp:comment elements that include particular keywords from specific child 
utility class elements' properties (specify the list of classes and properties 
to consider for this!). These keywords therefore make their way also to the 
entity's lucene document, full-text index.
2. in PaxtoolsDAO, remove the 'findEntity' method (won't be required anymore)
3. remove the WS controller corresponding to the "entity/find" command.
4. update docs.

Implementing this change also makes fixing issue#122 (pagination) less tricky.

Original issue reported on code.google.com by rod...@gmail.com on 2 Nov 2011 at 5:15

GoogleCodeExporter commented 9 years ago
Done:
- re-designed full-text index/search/filters using @FieldBridge annotations and 
custom FieldBridge implementations, which allows for flexible and non-trivial 
indexing, including adding child elements's keywords to parent's index, etc. 
This is almost entirely done in Paxtools (paxtools-core).
- removed find* commands and methods; replaced with "search"
- pagination should work properly now;
- in addition, filter by datasource/organism internal implementation has 
changed; it does not try first to match actually existing Provenance/BioSource 
anymore, and, instead, it simply passes all filter values from user's query to 
underlying method "as is"; this became possible as filters now use new 
'organism' and 'datasource' index fields created using all datasource/organism 
names and identifiers.

Original comment by rod...@gmail.com on 25 Jan 2012 at 5:46

GoogleCodeExporter commented 9 years ago

Original comment by rod...@gmail.com on 5 Jan 2013 at 12:18