Jayad / enterprise-log-search-and-archive

Automatically exported from code.google.com/p/enterprise-log-search-and-archive
0 stars 0 forks source link

ELSA won't validate after hard shutdown, no new logs inserted (search works) #144

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm not sure what exactly happens, but we last power to all our systems in the 
office and since then our elsa instance hasn't been inserting any new logs it 
gets via syslog (syslog-ng has been receiving the logs correctly and passing 
them off to ELSA's scripts just fine).

I tried to re-validate the directory like I had last time with 
echo "" | perl elsa.pl -o

But what I get is:
Validating directory...
Use of uninitialized value in string eq at /usr/local/elsa/node/Indexer.pm line 
884.
Use of uninitialized value in string eq at /usr/local/elsa/node/Indexer.pm line 
908.
DBD::mysql::st execute failed: no enabled local indexes to search at 
/usr/local/elsa/node/Indexer.pm line 2712.

searchd is running, as is mysqld. syslog-ng was off when I ran the validation.

What should I do?
No new logs get added, but searches on previous ones still work fine.

What version of the product are you using? On what operating system?
SVN ver 878 on CentOS 6.4 x86_64

Original issue reported on code.google.com by i...@pingas.org on 22 May 2013 at 4:31

GoogleCodeExporter commented 9 years ago
It looks to me like when searchd started, it probably gave a bunch of warnings 
saying that files didn't exist or were corrupt, which is why you're getting the 
error about "no enabled local indexes to search."  What I don't understand is 
why searches on previous logs would work.  Can you check the node.log for 
errors immediately after running the validation?

Original comment by mchol...@gmail.com on 22 May 2013 at 5:26

GoogleCodeExporter commented 9 years ago
Issue 145 has been merged into this issue.

Original comment by mchol...@gmail.com on 22 May 2013 at 5:26

GoogleCodeExporter commented 9 years ago
When I restart the searchd service, it doesn't give any errors other than a 
single index having an invalid header. Attached is the output of 'service 
searchd restart'

When I look in /data/elsa/log/node.log I see no errors, just regular 
trace/info/debug information. The last 1000 lines of node.log are at 
http://ix.io/5KT

Original comment by i...@pingas.org on 22 May 2013 at 5:37

Attachments:

GoogleCodeExporter commented 9 years ago
Looks like the error isn't contained in those 1000 lines which just show the 
index rotation.  Can you echo "" > /data/elsa/log/node.log, run the echo "" | 
perl elsa.pl -o, then upload node.log?

Original comment by mchol...@gmail.com on 22 May 2013 at 6:22

GoogleCodeExporter commented 9 years ago
The new node.log is at http://ix.io/5KU

Original comment by i...@pingas.org on 22 May 2013 at 6:28

GoogleCodeExporter commented 9 years ago
Ok, that shows that the validation succeeded.  Did you get the error message 
about "no local indexes enabled" on the command line when you ran that?

Original comment by mchol...@gmail.com on 22 May 2013 at 6:37

GoogleCodeExporter commented 9 years ago
I did indeed, it was:
DBD::mysql::st execute failed: no enabled local indexes to search at 
/usr/local/elsa/node/Indexer.pm line 2712.

Is it possible that I somehow changed the contents of that file? Should I try 
pulling an update from subversion?

Original comment by i...@pingas.org on 22 May 2013 at 6:39

GoogleCodeExporter commented 9 years ago
As a precaution, I updated my installation to 889. The error still occurs when 
I validate the directory and no new logs are inserted still.

Original comment by i...@pingas.org on 22 May 2013 at 7:02

GoogleCodeExporter commented 9 years ago
Can you try this:
mysql -uroot syslog -e 'select id, records from v_indexes where 
type="temporary" order by end desc limit 1'
Note the id returned (if any).  Use the id in the next query by appending 
"temp_" to the front of it.  So if the id was 1, then the table would be 
temp_1.  Then run this query:
mysql -h0 -P9306 -e 'select *, count(*) as _count host_id from <table> gropu by 
host_id limit 9999'
What is the output?  That's the query that's failing.

Original comment by mchol...@gmail.com on 22 May 2013 at 9:36

GoogleCodeExporter commented 9 years ago
elsa ~ # mysql -uroot syslog -e 'select id, records from v_indexes where 
type="temporary" order by end desc limit 1'
+-----+---------+
| id  | records |
+-----+---------+
| 149 |      43 |
+-----+---------+
elsa ~ # mysql -h0 -P9306 -e 'select *, count(*) as _count host_id from 
temp_149 group by host_id limit 9999'
ERROR 1064 (42000) at line 1: sphinxql: syntax error, unexpected IDENT, 
expecting FROM or ',' near 'host_id from temp_149 group by host_id limit 9999'

Not sure what the proper syntax is. What should I be running as that second 
command instead?

Original comment by i...@pingas.org on 23 May 2013 at 4:41

GoogleCodeExporter commented 9 years ago
Can you show the output of this:
ls -l /data/sphinx/*.sph

Original comment by mchol...@gmail.com on 23 May 2013 at 8:36

GoogleCodeExporter commented 9 years ago
Sorry for the long delay, I've been on vacation.

Linked is the output of that ls command: http://ix.io/5Qw

Original comment by i...@pingas.org on 29 May 2013 at 2:05

GoogleCodeExporter commented 9 years ago
Sorry, my mistake, I gave you an invalid query.  The query should've had a 
comma in between _count and host_id like this:

mysql -h0 -P9306 -e 'select *, count(*) as _count, host_id from temp_149 group 
by host_id limit 9999'

If that works, I recommend restarting searchd as it must be some temporary 
error that caused your initial issue.  Is it still continuing?

Original comment by mchol...@gmail.com on 1 Jun 2013 at 4:21

GoogleCodeExporter commented 9 years ago
Still having issues, though things have changed. Here's output of each command 
at the moment:

mysql -uroot syslog -e 'select id, records from v_indexes where 
type="temporary" order by end desc limit 1'
+----+---------+
| id | records |
+----+---------+
| 77 |      18 |
+----+---------+

mysql -h0 -P9306 --vertical -e 'select *, count(*) as _count, host_id from 
temp_77 group by host_id limit 9999'
*************************** 1. row ***************************
        id: 1057776
    weight: 1
 timestamp: 1370533496
       day: 15862
      hour: 380703
    minute: 22842224
   host_id: 3232239106
program_id: 3315324353
  class_id: 4
   attr_i0: 4624
   attr_i1: 2130706433
   attr_i2: 0
   attr_i3: 0
   attr_i4: 0
   attr_i5: 0
   attr_s0: 0
   attr_s1: 2547889144
   attr_s2: 2547889144
   attr_s3: 0
   attr_s4: 0
   attr_s5: 0
  @groupby: 3232239106
    @count: 19

ls -l /data/sphinx/*.sph
here: http://ix.io/67c

Original comment by i...@pingas.org on 11 Jun 2013 at 3:13

GoogleCodeExporter commented 9 years ago
I have rebooted the machine and started/stopped all services several times. I 
even used myisamchk on the syslog_data database (as it was marked as crashed at 
one point).

Queries still work, and syslog is still getting data, but no new data is 
showing up in the web view.

Original comment by i...@pingas.org on 11 Jun 2013 at 3:15

GoogleCodeExporter commented 9 years ago
Oh, there is a new error when I run elsa.pl though:

echo "" > /data/elsa/log/node.log
echo "" | perl elsa.pl -o
Use of uninitialized value $start in concatenation (.) or string at 
/usr/local/elsa/node/Indexer.pm line 454.
Use of uninitialized value $end in concatenation (.) or string at 
/usr/local/elsa/node/Indexer.pm line 454.
Use of uninitialized value $min in concatenation (.) or string at 
/usr/local/elsa/node/Indexer.pm line 454.
Use of uninitialized value $max in concatenation (.) or string at 
/usr/local/elsa/node/Indexer.pm line 454.
DBD::mysql::st execute failed: Column 'min_id' cannot be null at 
/usr/local/elsa/node/Indexer.pm line 455.
DBD::mysql::st execute failed: Column 'min_id' cannot be null at 
/usr/local/elsa/node/Indexer.pm line 455.
DBI::db=HASH(0x337aa40)->disconnect invalidates 1 active statement handle 
(either destroy statement handles or call finish on them before disconnecting) 
at /usr/local/elsa/node/Indexer.pm line 104.

Here is node.log: http://ix.io/67e

syslog_data.window is a view I created, by the way, but shouldn't be affecting 
anything with ELSA... Right?

Original comment by i...@pingas.org on 11 Jun 2013 at 3:17

GoogleCodeExporter commented 9 years ago
That output is suggesting that there's a table in the syslog_data database that 
has invalid data in it.  Can you check for a table there that isnt' in the 
syslog.tables table?

Original comment by mchol...@gmail.com on 11 Jun 2013 at 4:54

GoogleCodeExporter commented 9 years ago
Err, I don't see anything wrong here:

sudo mysql -e 'SELECT * FROM syslog.tables;'
+----+-------------------------------+---------------------+--------------------
-+--------+---------+---------------+-----------------+
| id | table_name                    | start               | end                
 | min_id | max_id  | table_type_id | table_locked_by |
+----+-------------------------------+---------------------+--------------------
-+--------+---------+---------------+-----------------+
|  1 | syslog_data.syslogs_index_1   | 2013-04-19 15:21:34 | 2013-06-06 
11:45:38 |      1 | 1057794 |             1 |            NULL |
|  2 | syslog_data.syslogs_archive_1 | 2013-04-19 15:23:24 | 2013-06-06 
12:17:05 |      1 | 1056013 |             2 |            NULL |
+----+-------------------------------+---------------------+--------------------
-+--------+---------+---------------+-----------------+

Both tables exist and they are the only two tables.

Original comment by i...@pingas.org on 11 Jun 2013 at 7:22

GoogleCodeExporter commented 9 years ago
So if you go into the syslog_data database, and type SHOW TABLES, are there any 
not listed above?

Original comment by mchol...@gmail.com on 11 Jun 2013 at 7:31

GoogleCodeExporter commented 9 years ago
We fixed it in-house, and it was a bit of a rubber mallet solution, but I think 
it exposes a bug in some of your code.

It seems that your validation code in Indexer.pm attempt to look at ALL things 
in information_schema instead of a static list. So, if someone added a view 
(like I did) it will attempt to process that view and error out (which is what 
happened here).

Here's the part of Indexer.pm I changed around line 422

$query = "SELECT CONCAT(t1.table_schema, \".\", t1.table_name) AS 
real_table,\n" .
                "t2.table_name AS recorded_table\n" .
                "FROM INFORMATION_SCHEMA.TABLES t1\n" .
                "LEFT JOIN tables t2 ON (CONCAT(t1.table_schema, \".\", t1.table_name)=t2.table_name)\n" .
                "WHERE t1.table_schema=\"$Data_db_name\" HAVING ISNULL(recorded_table)\n" .
                "AND recorded_table != \"WINDOW\";";
        $sth = $self->db->prepare($query);
        $sth->execute();

My view is called WINDOW is not a table, though it gets checked. Perhaps it 
makes sense to have a static list of items to verify when elsa validates its 
directory to allow for in-house changes to code?

Original comment by i...@pingas.org on 11 Jun 2013 at 9:06

GoogleCodeExporter commented 9 years ago
And yes, SHOW TABLES also returns my view WINDOW (which is not a table... go 
figure).

Original comment by i...@pingas.org on 11 Jun 2013 at 9:07

GoogleCodeExporter commented 9 years ago
mysql -e 'use syslog_data; SHOW TABLES;'
+-----------------------+
| Tables_in_syslog_data |
+-----------------------+
| syslogs_archive_1     |
| syslogs_index_1       |
| window                |
+-----------------------+

Original comment by i...@pingas.org on 11 Jun 2013 at 9:08

GoogleCodeExporter commented 9 years ago
Looking into the proper syntax, it seems I want to run something like:

mysql -e "use syslog_data; SHOW FULL TABLES where table_type = 'BASE TABLE';"
+-----------------------+------------+
| Tables_in_syslog_data | Table_type |
+-----------------------+------------+
| syslogs_archive_1     | BASE TABLE |
| syslogs_index_1       | BASE TABLE |
+-----------------------+------------+

Original comment by i...@pingas.org on 11 Jun 2013 at 9:14

GoogleCodeExporter commented 9 years ago
Ah, that will do it!  I've updated the code to make sure that the type is BASE 
TABLE and that the name is prefixed with "syslogs_" so you should be all set.  
Fixed in rev 916.

Original comment by mchol...@gmail.com on 11 Jun 2013 at 10:30