Closed coonsmatthew closed 8 years ago
Also, stopping cif-services and then elasticsearch services and then re-starting them appears to fix the issue for a few hours.
I'm noticing this error in CIF-SMRT as well:
2016-05-31T14:02:29,776Z][30347][ERROR]: [Timeout] ** [http://localhost:9200]-[599] Timed out while waiting for socket to become ready for reading, called from sub Search::Elasticsearch::Transport::try {...} at /usr/share/perl5/Try/Tiny.pm [observable-data]
However, when I check the elasticsearch service, it says it is running.
I'm going to try restarting the server and see if that helps at all.
elasticsearch log says:
[2016-05-31 12:57:38,855][DEBUG][action.bulk ] [Angela Del Toro] [cif.observables-2016.05.31][3] failed to execute bulk item (index) index {[cif.observables-2016.05.31][observables]} org.elasticsearch.index.engine.IndexFailedEngineException: [cif.observables-2016.05.31][3] Index failed for [observables#number] at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:530) at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:458) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:427) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:149) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:512) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:700) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:714) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1522) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252) at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:594) at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:522) ... 8 more Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:35) at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:210) at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:235) at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48) at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:252) at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:292) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151) at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:663) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526) ... 11 more
Matt,
Are you putting so much data into this that you are hitting some natural memory limits or do you believe there is a memory leak? We are going to need to collect more data. I believe the logs OutOfMemoryError
. The question is why?
Gabe,
We're manually submitting probably 1-2 observables a day ourselves. I just modified our CIF configuration so it is only pulling in the default feeds it ships with, any additional feeds I added have been disabled.
We will see if I notice any issues over the evening.
Also, when I check the free memory on the server, I have plenty and I am not swapping at all. So I'm not sure why Elasticsearch thinks it is out of memory space. I just checked the server performance and I'm only using 6 gig of ram (out of 16). Which is super weird, because usually elasticsearch seems to use almost all of the available memory.
Thank You.
head /etc/default/elasticsearch
# Run Elasticsearch as this user ID and group ID
#ES_USER=elasticsearch
#ES_GROUP=elasticsearch
# Heap Size (defaults to 256m min, 1g max)
#ES_HEAP_SIZE=2g
If when you start to get into ES tuning, I found these resources helpful:
http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html http://www.wilfred.me.uk/blog/2015/01/31/taming-a-wild-elasticsearch-cluster
additionally, my guess is, the additional feeds are what's chewing up the mem, along with tuning you might have to also boost the ram to 32g (just a hunch) and/or start thinking about breaking things up (ie: sep es instances into a 3-node env, etc). we have some helper doc in the wings we can try and guide you through, but i'd start with doing some tuning as Gabe suggested, then adding more memory for those additional feeds, then thinking through what a 3-node setup might look like...
Ok, thank you for both of your comments.
The ES errors seem to have self-healed, after I turned off the additional feeds that I was pulling in. I will look into modifying the memory on our server.
I checked the cif-smrt log and don't see any 500 errors for the last data pull. Additionally, I checked our ES config, and it does appear to be set to the defaults, so that is good.
I will watch for any additional errors going forward.
yea, out of the box you can do 16g, but with additional "heavy" feeds, you can very quickly get to 32g. past that, you're looking to split into 3-node (4-8G each?) clusters.. they can be smaller because of the way ES works, but you're also splitting out cif-router which also uses some memory (right now, our production is 3 es clusters and 2 cif-routers which are 8G each, if that gives you some idea).
Hello,
I'm having issues with what appears to be my elasticsearch database. After several hours of CIF running successfully, when I try to run
cif -p
I get:$ cif -p
[2016-05-31T07:47:35,458Z][WARN]: unauthorized
[2016-05-31T07:47:35,459Z][FATAL]: unauthorized
So I tailed the cif-router.log and noticed these lines (I just pasted a few of the many):