Jayad / enterprise-log-search-and-archive

Automatically exported from code.google.com/p/enterprise-log-search-and-archive
0 stars 0 forks source link

Sphinx query string generation error when multiple classes in query #157

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Searching with multiple classes and fields from each class

What is the expected output? What do you see instead?
Sphinx dies during the query because of a syntax error. This is the error for 
the particular query I was running:

Got sphinx error DBD::mysql::st fetchrow_hashref failed: index perm_20,perm_21: 
syntax error, unexpected '-' near ' !((@s2 "DNS data is too long")))' at 
/usr/local/elsa/web/lib/SyncMysql.pm line 143.

What version of the product are you using? On what operating system?
Sphinx  Sphinx 2.1.1-id64-beta (rel21-r3701)
Date    2013-06-14 08:47:45 -0600 (Fri, 14 Jun 2013)
Author  mcholste
Rev 926

Please provide any additional information below.

The issue is resolved by this patch for API.pm

@@ -2966,7 +2966,7 @@
        }

        if (@class_match_strs){
-               $match_str .= ' (' . join('|', @class_match_strs) . ')';
+               $match_str .= ' (' . join(') (', @class_match_strs) . ')';
        }

        $self->log->trace('match str: ' . $match_str);

Original issue reported on code.google.com by and...@wolftrail.com on 17 Jun 2013 at 9:32

GoogleCodeExporter commented 9 years ago
Thanks for reporting the bug.  Can you show me the original query as put into 
the query bar in the web interface?  The patch you've supplied would change the 
matching logic from a boolean OR to an AND, and I need to be sure that's what 
should happen here.

Original comment by mchol...@gmail.com on 18 Jun 2013 at 2:30

GoogleCodeExporter commented 9 years ago
timeout:0 +host=127.0.0.1 program="smartdefense" program="waf_wf" class=FW_IPS 
class=WAF -WAF.method=OPTIONS -WAF.method=PROPFIND -WAF.attack=COOKIE_TAMPERED 
-"Lan7.128" -"IPS Bypass" -FW_IPS.srcip=10.20.5.114 -WAF.srcip=192.174.55.105 
-FW_IPS.dstip=192.174.58.21 -FW_IPS.attack=ipfragments 
-FW_IPS.attack=tcp_invalid_checksum_enable -FW_IPS.attack=tcp_segment_limit 
-FW_IPS.attack=tcp_block_retrans_err_enable 
-FW_IPS.attack=tcp_block_urg_bit_enable -FW_IPS.attack=streamingengine 
-FW_IPS.message_info="DNS data is too long" -WAF.message_info="url-references" 
-WAF.message_info="webtop-error" -WAF.message_info="403" 
-WAF.site="autodiscover.xml" -WAF.site="javascript" 
-WAF.site="afdcredesign.afdcstage.nrel.gov" -WAF.rule="Deny-Internal"

With the patch, it generates this MATCH:

MATCH(' ((@host 2130706433)) !("IPS 
Bypass"|"Lan7.128"|(10.20.5.114|169084274)|(192.174.58.21|3232643605)) ( !((@s0 
COOKIE_TAMPERED)|(@s1 "Deny\\-Internal")|(@s3 "url\\-references")|(@s3 
"webtop\\-error")|(@s3 403)|(@s4 OPTIONS)|(@s4 PROPFIND)|(@s5 
"afdcredesign.afdcstage.nrel.gov")|(@s5 "autodiscover.xml")|(@s5 
"javascript"))) ( !((@s1 ipfragments)|(@s1 streamingengine)|(@s1 
tcp_block_retrans_err_enable)|(@s1 tcp_block_urg_bit_enable)|(@s1 
tcp_invalid_checksum_enable)|(@s1 tcp_segment_limit)|(@s2 "DNS data is too 
long")))')

Without the patch it generates this MATCH, which sphinx throws an error on:

MATCH(' ((@host 2130706433)) !("IPS 
Bypass"|"Lan7.128"|(10.20.5.114|169084274)|(192.174.58.21|3232643605)) ( !((@s0 
COOKIE_TAMPERED)|(@s1 "Deny\\-Internal")|(@s3 "url\\-references")|(@s3 
"webtop\\-error")|(@s3 403)|(@s4 OPTIONS)|(@s4 PROPFIND)|(@s5 
"afdcredesign.afdcstage.nrel.gov")|(@s5 "autodiscover.xml")|(@s5 
"javascript"))| !((@s1 ipfragments)|(@s1 streamingengine)|(@s1 
tcp_block_retrans_err_enable)|(@s1 tcp_block_urg_bit_enable)|(@s1 
tcp_invalid_checksum_enable)|(@s1 tcp_segment_limit)|(@s2 "DNS data is too 
long")))')

Original comment by and...@wolftrail.com on 18 Jun 2013 at 7:18

GoogleCodeExporter commented 9 years ago
I noticed another bug, it is losing the WAF.srcip from the resulting MATCH for 
some reason, I'll look into that and see if I can fix it.

Original comment by and...@wolftrail.com on 18 Jun 2013 at 7:26

GoogleCodeExporter commented 9 years ago
Per comment #3 - If the srcip term comes after other field matches for the 
specified class then the IPs are not added to the MATCH, this is the correct 
behavior, the IPs should not be in the MATCH in this case. 

On the other hand, if like in my example, you have an a term like srcip before 
any other fields for the specified class, then the IP address is added to the 
MATCH incorrectly. 

The issue is line 1282 in Query.pm:

elsif ($term_hash->{op} !~ /[\<\>]/ and not exists 
$self->terms->{field_terms}->{$boolean}->{$class_id}){

The exists test fails if the parser hasn't hit any field_terms for the class 
already. It might make sense for the parser to move attribute terms to the end 
of a query string so that they are parsed correctly regardless of how the user 
types them in.

Original comment by and...@wolftrail.com on 18 Jun 2013 at 8:06

GoogleCodeExporter commented 9 years ago
I've committed the fix for the reported issue, but I'm still unclear on the 
second bug.  Can you provide the shortest possible example query with the IP 
that is incorrectly added to the match string, and how it is incorrect?

Original comment by mchol...@gmail.com on 21 Jun 2013 at 6:33

GoogleCodeExporter commented 9 years ago
I added the following to line 1295 of Query.pm to help troubleshoot this.
$self->log->trace('Adding IP to any_field_terms: ' . $term_hash->{value});

So here is a short example query:
class=A class=B -A.srcip=192.168.1.1 -A.type=alert -B.user=luser 
-B.dstip=10.0.0.2

In this example I want logs that are from class A or B, except for class A logs 
from 192.168.1.1 or class A logs with type of alert or class B logs with a user 
of luser or a destination of 10.0.0.2. Because of the bug I would instead get 
logs from class A or class B except for any logs with a the string 192.168.1.1 
anywhere in the message or class A logs with type of alert or class B logs with 
a user of luser or a destination of 10.0.0.2.

If I rewrite the query like this then it works correctly:

class=A class=B -A.type=alert -B.user=luser -B.dstip=10.0.0.2 
-A.srcip=192.168.1.1

However, this still leaves the problem of when you only search for an attribute 
field for a class, like the following query:

class=A class=B -A.type=alert -B.dstip=10.0.0.2 -A.srcip=192.168.1.1

Can you explain the purpose of adding 10.0.0.2 to any_field_terms in this 
instance? It seems to needlessly exclude potential results. For example a log 
entry of class A with a source IP of 10.0.0.2 would be excluded in the last 
example by the MATCH generated by the code, which is not what someone writing 
the query would expect.

I think if you just comment out line 1295 then everything would work correctly. 
I haven't been able to come up with a scenario where the search breaks with 
that line commented out.

Original comment by and...@wolftrail.com on 21 Jun 2013 at 8:26

GoogleCodeExporter commented 9 years ago
Ok, yes I'm seeing the issue now.  The problem is that integer fields aren't 
currently indexed like string fields are.  This is an optimization to keep 
indexes smaller because searching the message will definitely find the string, 
since an integer is just one word.  The attribute filter ensures that only the 
right matches come back.  However, you've shown an edge case which only negated 
values don't work correctly.  This used to simply come back with an error 
saying there were no positive values in the query, but now that class-only 
queries can be run normally (they use SQL instead of Sphinx), this has come to 
light.  Hopefully that explains why the field term is put into the 
any_field_terms matching clause.  I'm working on a fix for the ordering problem.

Original comment by mchol...@gmail.com on 24 Jun 2013 at 3:06

GoogleCodeExporter commented 9 years ago
Ah, that makes sense. The weird bugs always pop up when the underlying 
assumptions change. Thanks for the feedback, let me know if there's anything I 
can help with.

Original comment by and...@wolftrail.com on 24 Jun 2013 at 3:47

GoogleCodeExporter commented 9 years ago
This is fixed in rev 944

Original comment by mchol...@gmail.com on 1 Jul 2013 at 3:02