Closed GoogleCodeExporter closed 8 years ago
I too have occasionally seen this inconsistency, but I haven't been able to
reproduce it reliably to debug. Can you tell me a little about your setup: Do
you have multiple nodes, are they fairly busy? When you get these results, is
there anything in web.log indicating that a node couldn't be reached?
Original comment by mchol...@gmail.com
on 15 Jun 2012 at 1:30
Everything is on one box. It is fairly decent too--a 1U server class system
with 8GB RAM and 2TB 15k RPM drives in a RAID 5. The OS is Oracle Unbreakable
Linux (RHEL, really). The install is minimal. I'll check the web log when I
have a chance at work.
Original comment by lib...@gmail.com
on 16 Jun 2012 at 3:14
Oh, and only maybe 100 eps right now. I have two destinations in
syslog-ng.conf, one for elsa and one for the file system. The file system is
ext4 and the database, etc are on a dedicated lvm volume. I'm using the Chrome
browser.
Original comment by lib...@gmail.com
on 16 Jun 2012 at 3:17
Another thing I have found that may be related: if I perform a search and get
results, then go back to that same tab and hit 'Submit Query' again, I
sometimes get a different number of results. For example, I had submitted a
query that returned four results, then three for a few times, then four again.
Original comment by lib...@gmail.com
on 24 Jun 2012 at 3:05
Thanks for the additional info. I agree that this is related, and I think
there are two issues. The first is a time format problem when using GMT on the
backend for datetime math (which is why report on "day" shows values for 19:00
hours). The second may be inconsistent results due to Sphinx swapping out
indexes during consolidation, or because it does not have enough RAM. To help
continue diagnosing this problem, can you tell me approximately how many logs
per second this instance is processing and how much RAM it has?
Original comment by mchol...@gmail.com
on 24 Jun 2012 at 6:05
I have 16GB RAM and average maybe around 40 EPS right now, which has peaked at
times to about 200 EPS.
Original comment by lib...@gmail.com
on 24 Jun 2012 at 6:55
Ok, the relatively low events per second is probably the reason that you're
seeing inconsistencies. ELSA is designed for high-volume processing. The good
news is I'm almost done bug testing the new code which will handle low-volume
log rates much more gracefully, so this should be fixed in the new code when I
release it shortly.
Original comment by mchol...@gmail.com
on 24 Jun 2012 at 7:03
I am anticipating 400-500 EPS when ELSA is rolled out to production, with
spikes up to 2000-3000. Is the new version a separate code base or is it simply
one which will scale from low EPS to high EPS gracefully?
Original comment by lib...@gmail.com
on 24 Jun 2012 at 7:11
The new version adds a standard feature to "downshift" into a realtime mode
whenever the events per second are below a certain threshold. I'm testing it
thoroughly because it will be the default mode. I have observed that it will
comfortably handle 1k events per second in realtime mode, so that should cover
your circumstances.
Original comment by mchol...@gmail.com
on 24 Jun 2012 at 7:26
Fixed in rev 330. Unfortunately, it was a bug on the backend, so while the
code update is normal, to actually get the fix working, you need to blow away
the /usr/local/etc/sphinx.conf file on the logging nodes and then run "perl
elsa.pl -on" to regenerate the new sphinx.conf file. Then restart searchd.
New indexes will have the correct time values in the reports and charts (all
logs already have correct time values in the standard console listing).
Original comment by mchol...@gmail.com
on 2 Jul 2012 at 8:52
Original issue reported on code.google.com by
lib...@gmail.com
on 15 Jun 2012 at 12:50