codelibs / elasticsearch-river-web

Web Crawler for Elasticsearch
Apache License 2.0
234 stars 57 forks source link

Incompatibility with river-imap #68

Open finalspy opened 10 years ago

finalspy commented 10 years ago

Since I installed both river-web (https://github.com/codelibs/elasticsearch-river-web) and river-imap (https://github.com/salyh/elasticsearch-river-imap) I got 3 node out of 5 that fails to start.

Here's the only line of log I got about this : [2014-09-27 23:53:48,622][ERROR][bootstrap ] {1.3.2}: Initialization Failed ... 1) NoSuchMethodError[org.apache.commons.codec.binary.Base64.(I[BZ)V]

Seems two plugins uses a different version of apache commons (maybe by transitive dependencies...)

Which is very weird is that 2 nodes starts (but fails to synchronize due to lack of other nodes).

I'm using version 1.3.2 of ES and the latest version of the two river plugins, installed following readme.md instructions.

finalspy commented 10 years ago

I can confirm that if I uninstall river-web / quartz , all nodes starts with river-imap.

But the opposite isn't true, with only river-web/quartz and no river-imap, nodes didn't start (It was working on saturday with river-web and no river-imap but isn't anymore).

As I run an ES 1.3.2 I tried with both river-web 1.3.0 and and fresh compile of river-web 1.3.1-SNAPSHOT (also update ES dependencies), and none of those releases helped the nodes to start.

Did I miss something ?

marevol commented 10 years ago

Please check Elasticsearch's log file.

finalspy commented 10 years ago

Here's the full log stack from one of the nodes that doesn't start :

[2014-09-29 15:40:13,618][INFO ][node ] [master] version[1.3.2], pid[3612], build[dee175d/2014-08-13T14:29:30Z] [2014-09-29 15:40:13,618][INFO ][node ] [master] initializing ... [2014-09-29 15:40:13,675][INFO ][plugins ] [master] loaded [QuartzPlugin, marvel, river-imap-0.3-7fbfd2d, river-rss, WebPlugin], sites [marvel, bigdesk] [2014-09-29 15:40:14,642][INFO ][org.codelibs.elasticsearch.quartz.service.ScheduleService] [master] Creating Scheduler... [2014-09-29 15:40:14,668][INFO ][org.quartz.impl.StdSchedulerFactory] Using default implementation for ThreadExecutor [2014-09-29 15:40:14,670][INFO ][org.quartz.simpl.SimpleThreadPool] Job execution threads will use class loader of thread: main [2014-09-29 15:40:14,679][INFO ][org.quartz.core.SchedulerSignalerImpl] Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl [2014-09-29 15:40:14,680][INFO ][org.quartz.core.QuartzScheduler] Quartz Scheduler v.2.2.1 created. [2014-09-29 15:40:14,680][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore initialized. [2014-09-29 15:40:14,681][INFO ][org.quartz.core.QuartzScheduler] Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 'NON_CLUSTERED' Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally. NOT STARTED. Currently in standby mode. Number of jobs executed: 0 Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads. Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.

[2014-09-29 15:40:14,681][INFO ][org.quartz.impl.StdSchedulerFactory] Quartz scheduler 'DefaultQuartzScheduler' initialized from default resource file in Quartz package: 'quartz.> properties' [2014-09-29 15:40:14,681][INFO ][org.quartz.impl.StdSchedulerFactory] Quartz scheduler version: 2.2.1 [2014-09-29 15:40:15,592][INFO ][org.codelibs.elasticsearch.web.service.S2ContainerService] Creating S2Container... [2014-09-29 15:40:15,609][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-framework is 2.4.46. [2014-09-29 15:40:15,610][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-extension is 2.4.46. [2014-09-29 15:40:15,610][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-tiger is 2.4.46. [2014-09-29 15:40:16,237][ERROR][bootstrap ] {1.3.2}: Initialization Failed ... 1) NoSuchMethodError[org.apache.commons.codec.binary.Base64.(I[BZ)V]

And here's the first part (before cluster synchronization issue due to not responding nodes) of logs from nodes that starts :

[2014-09-29 15:40:13,662][INFO ][node ] [other] version[1.3.2], pid[24170], build[dee175d/2014-08-13T14:29:30Z] [2014-09-29 15:40:13,662][INFO ][node ] [other] initializing ... [2014-09-29 15:40:13,718][INFO ][plugins ] [other] loaded [QuartzPlugin, marvel, river-rss, river-imap-0.3-7fbfd2d, WebPlugin], sites [marvel, bigdesk] [2014-09-29 15:40:15,563][INFO ][org.codelibs.elasticsearch.web.service.S2ContainerService] Creating S2Container... [2014-09-29 15:40:15,584][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-framework is 2.4.46. [2014-09-29 15:40:15,584][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-extension is 2.4.46. [2014-09-29 15:40:15,585][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-tiger is 2.4.46. [2014-09-29 15:40:16,183][WARN ][org.seasar.framework.container.assembler.BindingTypeShouldDef] Skip setting property, because property(client) of org.codelibs.elasticsearch.web.> config.RiverConfig not found [2014-09-29 15:40:16,445][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Running on [ENV]product, [DEPLOY MODE]Cool Deploy [2014-09-29 15:40:16,446][INFO ][org.codelibs.elasticsearch.quartz.service.ScheduleService] [other] Creating Scheduler... [2014-09-29 15:40:16,466][INFO ][org.quartz.impl.StdSchedulerFactory] Using default implementation for ThreadExecutor [2014-09-29 15:40:16,469][INFO ][org.quartz.simpl.SimpleThreadPool] Job execution threads will use class loader of thread: main [2014-09-29 15:40:16,482][INFO ][org.quartz.core.SchedulerSignalerImpl] Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl [2014-09-29 15:40:16,483][INFO ][org.quartz.core.QuartzScheduler] Quartz Scheduler v.2.2.1 created. [2014-09-29 15:40:16,484][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore initialized. [2014-09-29 15:40:16,484][INFO ][org.quartz.core.QuartzScheduler] Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 'NON_CLUSTERED' Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally. NOT STARTED. Currently in standby mode. Number of jobs executed: 0 Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads. Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.

[2014-09-29 15:40:16,484][INFO ][org.quartz.impl.StdSchedulerFactory] Quartz scheduler 'DefaultQuartzScheduler' initialized from default resource file in Quartz package: 'quartz.> properties' [2014-09-29 15:40:16,484][INFO ][org.quartz.impl.StdSchedulerFactory] Quartz scheduler version: 2.2.1 [2014-09-29 15:40:16,742][INFO ][node ] [other] initialized [2014-09-29 15:40:16,743][INFO ][node ] [other] starting ... [2014-09-29 15:40:16,743][INFO ][org.codelibs.elasticsearch.web.service.S2ContainerService] Starting S2Container... [2014-09-29 15:40:16,743][INFO ][org.codelibs.elasticsearch.quartz.service.ScheduleService] [other] Starting Scheduler... [2014-09-29 15:40:16,744][INFO ][org.quartz.core.QuartzScheduler] Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started. [2014-09-29 15:40:16,750][DEBUG][action.admin.cluster.health] [other] no known master node, scheduling a retry [2014-09-29 15:40:16,801][INFO ][transport ] [other] bound_address {inet[/0:0:0:0:0:0:0:0:XXXXX]}, publish_address {inet[/XXX.XXX.XXX.XXX:XXXXX]} [2014-09-29 15:40:16,804][INFO ][discovery ] [other] elasticdata/ie62O3_uSu6gU7uechQ6dQ

Seems S2 initialization logs appears first on working nodes :

[2014-09-29 15:40:15,563][INFO ][org.codelibs.elasticsearch.web.service.S2ContainerService] Creating S2Container... [2014-09-29 15:40:15,584][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-framework is 2.4.46. [2014-09-29 15:40:15,584][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-extension is 2.4.46. [2014-09-29 15:40:15,585][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-tiger is 2.4.46. [2014-09-29 15:40:16,183][WARN ][org.seasar.framework.container.assembler.BindingTypeShouldDef] Skip setting property, because property(client) of org.codelibs.elasticsearch.web.> config.RiverConfig not found [2014-09-29 15:40:16,445][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Running on [ENV]product, [DEPLOY MODE]Cool Deploy

but at the end just before the error for nodes that don't work :

[2014-09-29 15:40:15,592][INFO ][org.codelibs.elasticsearch.web.service.S2ContainerService] Creating S2Container... [2014-09-29 15:40:15,609][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-framework is 2.4.46. [2014-09-29 15:40:15,610][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-extension is 2.4.46. [2014-09-29 15:40:15,610][INFO ][org.seasar.framework.container.factory.SingletonS2ContainerFactory] Version of s2-tiger is 2.4.46.

This may indicate that the initialization process runs in a different order from one node to another... any idea on what could cause this behavior ?

(I don't know if this is normal but when I list plugins on different nodes, they are not sorted the same way, but plugins are sorted the same way on nodes that works)

[master] and [other] are my nodes' names and IP have been replaced by XXX.XXX.XXX.XXX

marevol commented 10 years ago

I think it's better to solve your problem(commons-codec error) on 1 node environment, not a cluster.

salyh commented 10 years ago

maybe the river-rss is the problem, it uses a old commons codec 1.2 version.

The constructor org.apache.commons.codec.binary.Base64.(I[BZ)V]
->
public Base64(int lineLength,
byte[] lineSeparator, boolean urlSafe)
exists since version 1.4
marevol commented 10 years ago

For the plugin dependency problem, I think Elasticsearch needs to solve: https://github.com/elasticsearch/elasticsearch/issues/5261

For the workaround, removing old commons-codec in plugins/river-rss directory, it may work...