CandyShop / gerrit

Automatically exported from code.google.com/p/gerrit
Apache License 2.0
1 stars 0 forks source link

Optimize commit message search predicate #1441

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently the "message:FOO" predicate has to obtain a RevWalk instance in the 
Git repository, and compares MessageRevFilter#include() for the message.  That 
has the downside of being incredibly slow for searching closed changes (or even 
for open changes, just particularly when there are *a lot* of changes to scan 
through).  For example:

 gerrit> query message:FOO 
 runTimeMilliseconds: 157

 gerrit> query message:FOO status:open
 runTimeMilliseconds: 87

 gerrit> query message:FOO status:merged
 runTimeMilliseconds: 7315

The "status:merged" search takes almost 50x longer than a search with no status 
predicate.

However, the ChangeData instance already has access to the commit message from 
the database, so the comparison here should be more in line with the speed of 
another predicate that accesses the database (for example, TopicPredicate):

 gerrit> query topic:FOO
 runTimeMilliseconds: 8

 gerrit> query topic:FOO status:open
 runTimeMilliseconds: 7

 gerrit> query topic:FOO status:merged
 runTimeMilliseconds: 121

A non-Git MessagePredicate that is able to search ChangeData#commitMessage() 
instead would be many times faster, wouldn't it?  Even if we need to support 
both, having the option (e.g., "message:*" searches Git, "subject:*" searches 
ChangeData#commitMessage()) would at least allow it to be faster in most cases.

Original issue reported on code.google.com by jhans...@myyearbook.com on 14 Jun 2012 at 12:16

GoogleCodeExporter commented 9 years ago
The commit message isn't stored in the database. Only the first line, aka the 
change subject, is stored in the database. The rest of the message is only 
available from Git and thus cannot be quickly scanned for from the SQL DB.

Long term the right approach is to index all of the data using Lucene, or 
another full text type indexing system, and convert all query operators over to 
searching fields in that full-text style inverted index.

Original comment by sop@google.com on 14 Jun 2012 at 1:13

GoogleCodeExporter commented 9 years ago
Shawn: thanks for the info.  In that case, a "subject:*" predicate would 
probably be helpful if searching just the subject is sufficient for the query.  
In my scenario, the text we are looking for is generally in the subject line, 
so that would still be a win.

Original comment by jhans...@myyearbook.com on 20 Jun 2012 at 8:43