Closed timcreatewell closed 10 years ago
Did you install quartz plugin?
$ $ES_HOME/bin/plugin --install org.codelibs/elasticsearch-quartz/1.0.1
If yes, could you check ES log file and provide the stacktrace?
Hi, yes I did install quartz:
[tim@localhost plugins]$ ls
head kopf quartz river-web
The contents of my log are as follows:
[2014-03-25 23:50:35,994][INFO ][node ] [Lynx] version[1.0.0.RC2], pid[14524], build[a9d736e/2014-02-03T15:02:11Z]
[2014-03-25 23:50:35,994][INFO ][node ] [Lynx] initializing ...
[2014-03-25 23:50:35,999][INFO ][plugins ] [Lynx] loaded [], sites []
[2014-03-25 23:50:38,601][INFO ][node ] [Lynx] initialized
[2014-03-25 23:50:38,601][INFO ][node ] [Lynx] starting ...
[2014-03-25 23:50:38,671][INFO ][transport ] [Lynx] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.20.122:9300]}
[2014-03-25 23:50:41,714][INFO ][cluster.service ] [Lynx] new_master [Lynx][RooMfQuzSZ-zzYyhom5DZA][localhost.localdomain][inet[/192.168.20.122:9300]], reason: zen-disco-join (elected_as_master)
[2014-03-25 23:50:41,741][INFO ][discovery ] [Lynx] elasticsearch/RooMfQuzSZ-zzYyhom5DZA
[2014-03-25 23:50:41,840][INFO ][http ] [Lynx] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.20.122:9200]}
[2014-03-25 23:50:41,867][INFO ][gateway ] [Lynx] recovered [0] indices into cluster_state
[2014-03-25 23:50:41,867][INFO ][node ] [Lynx] started
[2014-03-25 23:54:45,904][INFO ][cluster.metadata ] [Lynx] [robot] creating index, cause [api], shards [5]/[1], mappings []
[2014-03-25 23:54:53,657][INFO ][cluster.metadata ] [Lynx] [compassion_uat] creating index, cause [api], shards [5]/[1], mappings []
[2014-03-25 23:55:07,465][INFO ][cluster.metadata ] [Lynx] [compassion_uat] create_mapping [compassion_web]
[2014-03-25 23:55:25,630][INFO ][cluster.metadata ] [Lynx] [_river] creating index, cause [auto(index api)], shards [1]/[1], mappings []
[2014-03-25 23:55:25,795][INFO ][cluster.metadata ] [Lynx] [_river] update_mapping [compassion_web] (dynamic)
[2014-03-25 23:55:26,820][WARN ][river ] [Lynx] failed to create river [web][compassion_web]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [web]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:137)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:93)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: web
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:73)
... 9 more
[2014-03-25 23:55:26,832][INFO ][cluster.metadata ] [Lynx] [_river] update_mapping [compassion_web] (dynamic)
[2014-03-25 23:57:35,306][INFO ][cluster.metadata ] [Lynx] [_river] update_mapping [compassion_web] (dynamic)
If it helps, the installed java version is as follows:
[tim@localhost elasticsearch]$ java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (rhel-2.4.4.1.el6_5-x86_64 u51-b02)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
Thanks for your help!
Thank you for the info. I do not think that your ES load river-web plugin in plugins directory. Could you check files in $ES_HOME/plugins/river-web directory and also the file permissions?
I've just checked and all the files are there, I did a chmod 755 across them and they all seem to work now. However, I am now receiving the following error:
[2014-03-26 01:19:04,221][ERROR][org.seasar.robot.helper.impl.LogHelperImpl] System Error.
org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query], all shards failed; shardFailures {[rhyrVnOJTpi6KBrzvo30Nw][compassion_uat][2]: SearchParseException[[compassion_uat][2]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [Failed to parse source [{"from":0,"size":1,"query":{"term":{"url":"https://compassionau.custhelp.com/ci/sitemap/"}},"sort":[{"lastModified":{"order":"desc"}}]}]]]; nested: SearchParseException[[compassion_uat][2]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [No mapping found for [lastModified] in order to sort on]]; }{[rhyrVnOJTpi6KBrzvo30Nw][compassion_uat][1]: SearchParseException[[compassion_uat][1]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [Failed to parse source [{"from":0,"size":1,"query":{"term":{"url":"https://compassionau.custhelp.com/ci/sitemap/"}},"sort":[{"lastModified":{"order":"desc"}}]}]]]; nested: SearchParseException[[compassion_uat][1]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [No mapping found for [lastModified] in order to sort on]]; }{[rhyrVnOJTpi6KBrzvo30Nw][compassion_uat][0]: SearchParseException[[compassion_uat][0]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [Failed to parse source [{"from":0,"size":1,"query":{"term":{"url":"https://compassionau.custhelp.com/ci/sitemap/"}},"sort":[{"lastModified":{"order":"desc"}}]}]]]; nested: SearchParseException[[compassion_uat][0]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [No mapping found for [lastModified] in order to sort on]]; }{[rhyrVnOJTpi6KBrzvo30Nw][compassion_uat][4]: SearchParseException[[compassion_uat][4]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [Failed to parse source [{"from":0,"size":1,"query":{"term":{"url":"https://compassionau.custhelp.com/ci/sitemap/"}},"sort":[{"lastModified":{"order":"desc"}}]}]]]; nested: SearchParseException[[compassion_uat][4]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [No mapping found for [lastModified] in order to sort on]]; }{[rhyrVnOJTpi6KBrzvo30Nw][compassion_uat][3]: SearchParseException[[compassion_uat][3]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [Failed to parse source [{"from":0,"size":1,"query":{"term":{"url":"https://compassionau.custhelp.com/ci/sitemap/"}},"sort":[{"lastModified":{"order":"desc"}}]}]]]; nested: SearchParseException[[compassion_uat][3]: query[url:https://compassionau.custhelp.com/ci/sitemap/],from[0],size[1]: Parse Failure [No mapping found for [lastModified] in order to sort on]]; }
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:272)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:224)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:205)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I am also receiving this on my qbox.io cluster (qbox.io support confirmed this this morning). Have you seen this error before?
I have not seen it... Could you check a mapping?
curl -XGET localhost:9200/compassion_uat/compassion_web/_mapping?pretty
If the mapping is not correct, I think that it's better to recreate compassion_uat index.
The request returns this:
{
"compassion_uat" : {
"mappings" : {
"compassion_web" : {
"dynamic_templates" : [ {
"url" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "url"
}
}, {
"method" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "method"
}
}, {
"charSet" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "charSet"
}
}, {
"mimeType" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "mimeType"
}
} ],
"properties" : { }
}
}
}
}
As far as I can tell it's correct?
"properties" : { }
The properties is empty... So, compassion_web does not have a mapping info. Could you re-register the river with "incremental":false? If it works, please re-register it with "incremental":true again.
Hi there,
I removed all the indexes and started over with the river set to "incremental": false - I can see documents being indexed which is great!
When I update the river by running:
curl -XPUT 'http://localhost:9200/_river/compassion_web/_meta' -d '
{
"type" : "web",
"crawl" : {
"index" : "compassion_uat",
"url" : ["https://compassionau.custhelp.com/ci/sitemap/"],
"includeFilter" : ["https://compassionau.custhelp.com/.*"],
"maxDepth" : 30,
"maxAccessCount" : 1000,
"numOfThread" : 10,
"interval" : 1000,
"incremental" : true,
"overwrite" : true,
"robotsTxt" : false,
"userAgent" : "bingbot",
"target" : [
{
"pattern" : {
"url" : "https://compassionau.custhelp.com/app/answers/detail/.*",
"mimeType" : "text/html"
},
"properties" : {
"title" : {
"text" : "h1#rn_Summary"
},
"body" : {
"text" : "div#rn_AnswerText",
"trimSpaces" : true
}
}
}
]
},
"schedule" : {
"cron" : "*/2 * * * * ?"
}
}
... everything still seems to be working?
Thank you for checking it. An incremental crawling needs a mapping before crawling... Therefore, it works because of creating the mapping by non-incremental crawling. I'll fix this problem in a next release.
Thanks for the help - really appreciate it!
Hi marevol,
I did same as you said, created a crawling river by mentioning incremental:false initially and then deleted and recreated the same with incremental:true which got failed to index files in later case. Please let me if I made any mistake.
This is the log stacktrace : [2014-03-26 19:43:12,763][INFO ][cluster.metadata ] [Intermec] [_river] update_mapping es_htmls [2014-03-26 19:43:12,768][INFO ][org.codelibs.elasticsearch.web.river.WebRiver] Creating WebRiver: es_htmls [2014-03-26 19:43:12,768][INFO ][org.codelibs.elasticsearch.web.river.WebRiver] Scheduling CrawlJob... [2014-03-26 19:43:12,771][WARN ][org.seasar.framework.container.assembler.BindingTypeShouldDef] Skip setting property, because property(requestListener) of org.seasar.robot.client.FaultTolerantClient not found [2014-03-26 19:43:12,774][INFO ][cluster.metadata ] [Intermec] [_river] update_mapping es_htmls [2014-03-26 19:43:12,864][ERROR][org.seasar.robot.helper.impl.LogHelperImpl] System Error. java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Date at org.codelibs.elasticsearch.web.robot.service.EsUrlQueueService.poll(EsUrlQueueService.java:107) at org.seasar.robot.S2RobotThread.run(S2RobotThread.java:128) at java.lang.Thread.run(Thread.java:722) [2014-03-26 19:43:13,002][INFO ][org.codelibs.elasticsearch.web.river.WebRiver] web.es_htmlsJob is running. [2014-03-26 19:43:14,000][INFO ][org.codelibs.elasticsearch.web.river.WebRiver] web.es_htmlsJob is running.
Could you re-create "robot" index?
curl -XDELETE 'localhost:9200/robot/'
curl -XPUT 'localhost:9200/robot/'
Mistake is from my end, I first created a river with incremental:false and deleted the river and re-created again with incremental:true. But what I concluded is just updating the river is enough with incremental:true instead of deleting and re-creating it.
I hope this fixed the issue which got worked for me.
Thanks, Srinivas
Filed #22 and #24. Problems on this issue will be fixed in a next release.
I got the following log when try the above steps with incremental:false 2014-07-24 14:56:53,509][WARN ][org.seasar.framework.container.assembler.BindingTypeShouldDef] Skip setting property, because property(requestListener) of org seasar.robot.client.FaultTolerantClient not found 2014-07-24 14:56:53,525][INFO ][cluster.metadata ] [Black Tarantula] [_river] update_mapping compassion_web 2014-07-24 14:59:33,250][INFO ][cluster.metadata ] [Black Tarantula] [[_river]] remove_mapping [[compassion_web]] 2014-07-24 14:59:33,252][INFO ][org.codelibs.elasticsearch.web.river.WebRiver] Unscheduling CrawlJob... 2014-07-24 14:59:33,260][INFO ][org.codelibs.elasticsearch.web.river.WebRiver] Deleted one time river: compassion_web
When I run this URl "curl -XGET localhost:9200/compassion_uat/compassion_web/_mapping?pretty"
still i get mapping as empty
I am also getting the same "org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [web]". I tried setting incremental: false when creating the crawl but it didnt help.
FYI I am using ES 1.3.0, river web plugin 1.3.0 and quartz 1.0.1
When I do
curl -XGET localhost:9200/webindex/my_web/_mapping?pretty
I get -
{
"webindex" : {
"mappings" : {
"my_web" : {
"dynamic_templates" : [ {
"url" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "url"
}
}, {
"method" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "method"
}
}, {
"charSet" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "charSet"
}
}, {
"mimeType" : {
"mapping" : {
"type" : "string",
"store" : "yes",
"index" : "not_analyzed"
},
"match" : "mimeType"
}
} ],
"properties" : { }
}
}
}
}
somehow properties are not getting set even if I set incremental as false.
curl -XPUT 'localhost:9200/_river/my_web/_meta' -d '{
"type" : "web",
"crawl" : {
"index" : "webindex",
"url" : ["http://gta.wikia.com"],
"includeFilter" : ["http://gta.wikia.com/.*"],
"maxDepth" : 3,
"maxAccessCount" : 1000000,
"numOfThread" : 5,
"interval" : 1000,
"incremental" : false,
"target" : [
{
"pattern" : {
"url" : "http://gta.wikia.com/.*",
"mimeType" : "text/html"
},
"properties" : {
"title" : {
"text" : "title"
},
"body" : {
"text" : "body"
},
"bodyAsHtml" : {
"html" : "body"
}
}
}
]
},
"schedule" : {
"cron" : "0 0 6 * * ?"
}
}'
Closed this issue because of mixing multiple problems. If you see NoClassSettingsException, I think that an installation for river-web was failed.
@marevol actually the river web was installed successfully still I am getting this exception -
[2014-08-04 10:47:21,080][INFO ][cluster.metadata ] [elasticsearch_0] [webindex] creating index, cause [api], shards [5]/[1], mappings []
[2014-08-04 10:48:25,455][INFO ][cluster.metadata ] [elasticsearch_0] [webindex] create_mapping [my_web]
[2014-08-04 10:59:18,146][INFO ][cluster.metadata ] [elasticsearch_0] [_river] creating index, cause [auto(index api)], shards [1]/[1], mappings []
[2014-08-04 10:59:18,251][INFO ][cluster.metadata ] [elasticsearch_0] [_river] update_mapping [my_web] (dynamic)
[2014-08-04 10:59:19,267][WARN ][river ] [elasticsearch_0] failed to create river [web][my_web]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [web]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:137)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: web
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:73)
... 9 more
[2014-08-04 10:59:19,279][INFO ][cluster.metadata ] [elasticsearch_0] [_river] update_mapping [my_web] (dynamic)
Hi there,
I've just created a brand new Centos VM (v6), installed ElasticSearch v1.0.0RC2 and elasticsearch-river-web v1.1.0 as per the instructions.
I then have gone to setup my crawl by running the following:
After doing this I cannot see any documents appearing in the index, so I have looked at the _river index and can see the following error:
Have I missed a step?
Thanks, Tim.