Open Neel-Gagan opened 4 years ago
@Neel-Gagan Could you use 2.7-SNAPSHOT?
Could you run the job again only on this file (one folder with only this file) with the --debug
option so there's a chance we can see a full stacktrace.
continue_on_error
setting which should just skip the file but may be I'm not catching the right thing. What is happening when you are getting this error? Is FCrawler stopping?file.filename
field should be a stored field.1) yes the crawler is getting stopped after this error.
2) the mapping seems correct, as the error comes midway in the crawling after crawling few files.
continue_on_error
setting?1) continue_on_error="true" is there in fscralwer's _setting.json file.
2) i have many folder and i am applying crawling in root folder. after crawling few of the folders i get this error Please set stored:true field on [file.filename].
The crawling is being performed on existing indexes moved from another system. The index is up. And in mapping also stored.filename is set. Can't figure out why this error is coming midway of crawling?
on running a fresh installation of fscrawler 2.7 with a new job f_mi got the below mentioned error on crawling a database file of 1.8GB below is the trace file
"file" : {
"extension" : "accdb",
"content_type" : "application/x-msaccess",
"created" : "2018-07-09T09:22:54.911+0000",
"last_modified" : "2018-08-01T08:47:11.035+0000",
"last_accessed" : "2020-04-09T12:49:17.016+0000",
"indexing_date" : "2020-06-09T08:02:00.993+0000",
"filesize" : 1274957824,
"filename" : "Database4.accdb",
"url" : "file://F:\\Test Data\\Database4.accdb"
},
"path" : {
"root" : "a9f7b81422814a76439be45c7e2281",
"virtual" : "/Test Data/Database4.accdb",
"real" : "F:\\Test Data\\Database4.accdb"
}
}
13:33:07,150 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling F:\Test Data: integer overflow
13:33:07,150 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.ArithmeticException: integer overflow
at java.lang.Math.multiplyExact(Unknown Source) ~[?:1.8.0_171]
at org.apache.lucene.util.UnicodeUtil.maxUTF8Length(UnicodeUtil.java:618) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:84) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
at org.elasticsearch.common.bytes.BytesArray.<init>(BytesArray.java:32) ~[elasticsearch-6.6.0.jar:6.6.0]
at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:357) ~[elasticsearch-6.6.0.jar:6.6.0]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.index(ElasticsearchClientV6.java:375) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.esIndex(FsParserAbstract.java:577) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.indexFile(FsParserAbstract.java:479) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:267) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:149) [fscrawler-core-2.7-SNAPSHOT.jar:?]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
13:33:07,154 INFO [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
13:33:07,201 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
13:33:07,201 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
13:33:07,205 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
13:33:07,205 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
13:33:07,205 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
13:33:07,205 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
Mapping is incorrect: please set stored: true on field
A gentle reminder regarding the raised issue.
Mapping is incorrect: please set stored: true on field
That's the other story we are tracking with #937. Let's not mix the problems.
This one is very interesting:
java.lang.ArithmeticException: integer overflow
at java.lang.Math.multiplyExact(Unknown Source) ~[?:1.8.0_171]
at org.apache.lucene.util.UnicodeUtil.maxUTF8Length(UnicodeUtil.java:618) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:84) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
at org.elasticsearch.common.bytes.BytesArray.<init>(BytesArray.java:32) ~[elasticsearch-6.6.0.jar:6.6.0]
at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:357) ~[elasticsearch-6.6.0.jar:6.6.0]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.index(ElasticsearchClientV6.java:375) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.esIndex(FsParserAbstract.java:577) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.indexFile(FsParserAbstract.java:479) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:267) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:149) [fscrawler-core-2.7-SNAPSHOT.jar:?]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
But it was with elasticsearch 6.6.0 client. I'd like you to upgrade Elasticsearch to the latest 6.8.10 and use the latest SNAPSHOT build for FSCrawler - es6.
Just to see if that problem has gone by upgrading ESClient and Lucene. Otherwise, that will be something I need to report to Lucene project and may be @nknize.
FSCRawler 2.6 Elasticsearchv 6.8 Kibana 6.8
while crawling .mdb files of around 500 MB gettting the error : Error while crawling C:\Data: integer overflow 10:25:13,992 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace java.lang.ArithmeticException: integer overflow
1) Is the integer overflow error related to data within the file or is there any way to bypass this error 2) one more issue while crawling is getting the error: Please set stored:true field on [file.filename].
what needs to be done to do away with these errors ?