dadoonet / fscrawler

Elasticsearch File System Crawler (FS Crawler)
https://fscrawler.readthedocs.io/
Apache License 2.0
1.34k stars 297 forks source link

Issue with Memory Management #941

Open Neel-Gagan opened 4 years ago

Neel-Gagan commented 4 years ago

ES version 6.8, Fscrawler version 2.6. ES is running in three nodes in cluster. when simultaneous fscrawler jobs run. crawler gives bulk_failure error. i have set -Xms15G -Xmx15G which is 50% of available ram in one node. what is the ideal setting for efficient memory management to get rid of bulk failure and socket hangup error.

dadoonet commented 4 years ago

Is the error coming from FSCrawler or elasticsearch ? What is the error please?

Neel-Gagan commented 4 years ago

Error is bulk insert failure in elasticsearch and socket hangup error. Crawler gets paused when such error comes.

dadoonet commented 4 years ago

Could you share the full elasticsearch logs please? And the FSCrawler job config?

Neel-Gagan commented 4 years ago

These are the error which are coming while elasticsearch is running.

9165] overhead, spent [449ms] collecting in the last [1.2s]
[2020-04-21T16:47:31,566][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
9167] overhead, spent [325ms] collecting in the last [1s]
[2020-04-21T16:47:43,519][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
9168] overhead, spent [4.1s] collecting in the last [11.9s]
[2020-04-21T16:47:44,613][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
9169] overhead, spent [529ms] collecting in the last [1s]
[2020-04-21T16:47:45,863][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
9170] overhead, spent [379ms] collecting in the last [1.2s]
[2020-04-21T16:47:49,238][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
9173] overhead, spent [388ms] collecting in the last [1s]

[2020-04-21T16:47:04,957][ERROR][o.e.a.b.TransportBulkAction] [node_cdams] faile
d to execute pipeline for a bulk request
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected
execution of org.elasticsearch.ingest.IngestService$4@4b3feb8a on EsThreadPoolEx
ecutor[name = node_cdams/write, queue capacity = 200, org.elasticsearch.common.u
til.concurrent.EsThreadPoolExecutor@415dcc4a[Running, pool size = 4, active thre
ads = 4, queued tasks = 231, completed tasks = 2731]]

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 00007FF714520EFA v8::internal::GCIdleTimeHandler::GCIdleTimeHandler+4810
 2: 00007FF7144FA296 node::MakeCallback+4518
 3: 00007FF7144FAC80 node_module_register+2160
 4: 00007FF7147909BE v8::internal::FatalProcessOutOfMemory+846
 5: 00007FF7147908EF v8::internal::FatalProcessOutOfMemory+639
 6: 00007FF714CCE954 v8::internal::Heap::MaxHeapGrowingFactor+11476
 7: 00007FF714CCC6E8 v8::internal::Heap::MaxHeapGrowingFactor+2664
 8: 00007FF71486F0CB v8::internal::Factory::AllocateRawWithImmortalMap+59
 9: 00007FF714871BCD v8::internal::Factory::NewRawTwoByteString+77
10: 00007FF714A85C58 v8::internal::Smi::SmiPrint+536
11: 00007FF714783ECB v8::internal::StringHasher::UpdateIndex+219
12: 00007FF71495EFD9 v8::internal::CodeStubAssembler::ConstexprBoolNot+38457
13: 00007FF71495EF4B v8::internal::CodeStubAssembler::ConstexprBoolNot+38315
14: 000003ED22C5C721
dadoonet commented 4 years ago

Could you share the 20 first lines of your elasticsearch node when it starts?

Neel-Gagan commented 4 years ago

i am sharing the entire es screen

Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

C:\ELK\elasticsearch-6.8.0\bin>elasticsearch
[2020-04-27T17:44:50,971][WARN ][o.e.c.l.LogConfigurator  ] [node_cdams] Some lo
gging configurations have %marker but don't have %node_name. We will automatical
ly add %node_name to the pattern to ease the migration for users who customize l
og4j2.properties but will stop this behavior in 7.0. You should manually replace
 `%node_name` with `[%node_name]%marker ` in these locations:
  C:\ELK\elasticsearch-6.8.0\config\log4j2.properties
[2020-04-27T17:44:55,830][INFO ][o.e.e.NodeEnvironment    ] [node_cdams] using [
1] data paths, mounts [[(C:)]], net usable_space [296.6gb], net total_space [799
.6gb], types [NTFS]
[2020-04-27T17:44:55,830][INFO ][o.e.e.NodeEnvironment    ] [node_cdams] heap si
ze [9.9gb], compressed ordinary object pointers [true]
[2020-04-27T17:45:00,268][INFO ][o.e.n.Node               ] [node_cdams] node na
me [node_cdams], node ID [1iktpixvRyqltWW--o8CjA]
[2020-04-27T17:45:00,268][INFO ][o.e.n.Node               ] [node_cdams] version
[6.8.0], pid[2344], build[default/zip/65b6179/2019-05-15T20:06:13.172855Z], OS[W
indows Server 2012 R2/6.3/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit
 Server VM/1.8.0_171/25.171-b11]
[2020-04-27T17:45:00,268][INFO ][o.e.n.Node               ] [node_cdams] JVM arg
uments [-Xms10G, -Xmx10G, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFra
ction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djav
a.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTrac
eInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -D
io.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dl
og4j2.disable.jmx=true, -Djava.io.tmpdir=C:\Users\ADMINI~1\AppData\Local\Temp\2\
elasticsearch, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:Error
File=logs/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+Pr
intTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log
, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -
Delasticsearch, -Des.path.home=C:\ELK\elasticsearch-6.8.0, -Des.path.conf=C:\ELK
\elasticsearch-6.8.0\config, -Des.distribution.flavor=default, -Des.distribution
.type=zip]
[2020-04-27T17:45:06,830][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [aggs-matrix-stats]
[2020-04-27T17:45:06,830][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [analysis-common]
[2020-04-27T17:45:06,830][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [ingest-common]
[2020-04-27T17:45:06,830][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [ingest-geoip]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [ingest-user-agent]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [lang-expression]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [lang-mustache]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [lang-painless]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [mapper-extras]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [parent-join]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [percolator]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [rank-eval]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [reindex]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [repository-url]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [transport-netty4]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [tribe]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-ccr]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-core]
[2020-04-27T17:45:06,846][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-deprecation]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-graph]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-ilm]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-logstash]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-ml]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-monitoring]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-rollup]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-security]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-sql]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-upgrade]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
module [x-pack-watcher]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
plugin [analysis-phonetic]
[2020-04-27T17:45:06,862][INFO ][o.e.p.PluginsService     ] [node_cdams] loaded
plugin [ingest-opennlp]
[2020-04-27T17:45:18,221][INFO ][d.s.e.i.o.OpenNlpService ] [node_cdams] Read mo
dels in [3.8s] for [persons, dates, locations]
[2020-04-27T17:45:22,065][INFO ][o.e.x.s.a.s.FileRolesStore] [node_cdams] parsed
 [0] roles from file [C:\ELK\elasticsearch-6.8.0\config\roles.yml]
[2020-04-27T17:45:23,549][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [node_cdams]
[controller/4040] [Main.cc@109] controller (64 bit): Version 6.8.0 (Build e6cf25
e2acc5ec) Copyright (c) 2019 Elasticsearch BV
[2020-04-27T17:45:24,971][DEBUG][o.e.a.ActionModule       ] [node_cdams] Using R
EST wrapper from plugin org.elasticsearch.xpack.security.Security
[2020-04-27T17:45:26,924][INFO ][o.e.d.DiscoveryModule    ] [node_cdams] using d
iscovery type [zen] and host providers [settings]
[2020-04-27T17:45:28,815][INFO ][o.e.n.Node               ] [node_cdams] initial
ized
[2020-04-27T17:45:28,830][INFO ][o.e.n.Node               ] [node_cdams] startin
g ...
[2020-04-27T17:45:29,502][INFO ][o.e.t.TransportService   ] [node_cdams] publish
_address {192.232.15.189:9300}, bound_addresses {192.232.15.189:9300}
[2020-04-27T17:45:29,971][INFO ][o.e.b.BootstrapChecks    ] [node_cdams] bound o
r publishing to a non-loopback address, enforcing bootstrap checks
[2020-04-27T17:45:37,925][INFO ][o.e.c.s.ClusterApplierService] [node_cdams] det
ected_master {node_master}{lEYqEjlZShSUhqtYlFakJA}{gmWLiDfIQ8af7fCZHsa1Qg}{192.X.X.X}{192.X.X.X:9300}{ml.machine_memory=137070587904, ml.max_open_jobs=20,
 xpack.installed=true, ml.enabled=true}, added {{node_master}{lEYqEjlZShSUhqtYlF
akJA}{gmWLiDfIQ8af7fCZHsa1Qg}{192.X.X.X}{192.X.X.X:9300}{ml.machine_memory
=137070587904, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{node
_92}{CBNmPgYxTeqZRgfVk8dFMg}{DT4OohCdQe6ufTE3N2LvtQ}{192.X.X.X}{192.X.X.X:9300}
{ml.machine_memory=34217000960, ml.max_open_jobs=20, xpack.installed=true
, ml.enabled=true},}, reason: apply cluster state (from master [master {node_mas
ter}{lEYqEjlZShSUhqtYlFakJA}{gmWLiDfIQ8af7fCZHsa1Qg}{192.X.X.X}{192.X.X.X:
9300}{ml.machine_memory=137070587904, ml.max_open_jobs=20, xpack.installed=true,
 ml.enabled=true} committed version [1163]])
[2020-04-27T17:45:38,019][INFO ][o.e.c.s.ClusterSettings  ] [node_cdams] updatin
g [xpack.monitoring.collection.enabled] from [false] to [true]
[2020-04-27T17:45:41,956][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [node_cda
ms] Failed to clear cache for realms [[]]
[2020-04-27T17:45:41,956][INFO ][o.e.x.s.a.TokenService   ] [node_cdams] refresh
 keys
[2020-04-27T17:45:42,675][INFO ][o.e.x.s.a.TokenService   ] [node_cdams] refresh
ed keys
[2020-04-27T17:45:42,800][INFO ][o.e.l.LicenseService     ] [node_cdams] license
 [5a512914-5bd8-467c-bcab-acfae1b30b89] mode [basic] - valid
[2020-04-27T17:45:42,847][INFO ][o.e.h.n.Netty4HttpServerTransport] [node_cdams]
 publish_address {192.232.15.189:9200}, bound_addresses {192.232.15.189:9200}
[2020-04-27T17:45:42,878][INFO ][o.e.n.Node               ] [node_cdams] started

[2020-04-27T17:47:19,772][WARN ][o.e.t.TransportService   ] [node_cdams] Receive
d response for a request that has timed out, sent [100893ms] ago, timed out [709
32ms] ago, action [internal:discovery/zen/fd/master_ping], node [{node_master}{l
EYqEjlZShSUhqtYlFakJA}{gmWLiDfIQ8af7fCZHsa1Qg}{192.X.X.X}{192.X.X.X:9300}{
ml.machine_memory=137070587904, ml.max_open_jobs=20, xpack.installed=true, ml.en
abled=true}], id [19]
[2020-04-27T17:56:48,515][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
672] overhead, spent [363ms] collecting in the last [1.1s]
[2020-04-28T14:21:20,512][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
73387] overhead, spent [314ms] collecting in the last [1s]
[2020-04-28T14:21:21,715][INFO ][o.e.m.j.JvmGcMonitorService] [node_cdams] [gc][
73388] overhead, spent [386ms] collecting in the last [1.2s]
Neel-Gagan commented 4 years ago

a gentle reminder regarding this request

Neel-Gagan commented 4 years ago

a gentle reminder regarding this request

Neel-Gagan commented 4 years ago

A gentle reminder regarding the raised issue .

Neel-Gagan commented 4 years ago

A gentle reminder regarding update on this issue.

dadoonet commented 4 years ago

I looked at the logs

[2020-04-27T17:44:55,830][INFO ][o.e.e.NodeEnvironment    ] [node_cdams] heap size [9.9gb], compressed ordinary object pointers [true]

You did not set the HEAP Size to 15gb, right?

Also:

[2020-04-27T17:45:00,268][INFO ][o.e.n.Node               ] [node_cdams] version [6.8.0], pid[2344], build[default/zip/65b6179/2019-05-15T20:06:13.172855Z], OS[Windows Server 2012 R2/6.3/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_171/25.171-b11]

Could you use 6.8.10 which I think is the latest 6.8 version? I'd also recommend upgrading the JVM. You can go up to JVM14 instead.

Neel-Gagan commented 4 years ago

i raised heap size to 50% of the available RAM size. will upgrading it more will help in sorting the issue ?