Open rizwan-ahmad-ms opened 4 years ago
Your mapping is incorrect. Could you share it? Please format code and logs with markdown so it's more readable.
Here is the index mapping
put indextwi { "settings": { "number_of_shards": 1 }, "mappings": { "properties": { "content": { "type": "text" }, "storageName": { "type": "keyword", "null_value": "NULL" }, "storagePath": { "type": "keyword", "null_value": "NULL" }, "folderPath": { "type": "keyword", "null_value": "NULL" }, "aSN": { "type": "keyword", "null_value": "NULL" }, "docType": { "type": "keyword", "null_value": "NULL" }, "referenceKey": { "type": "keyword", "null_value": "NULL" }, "docNo": { "type": "keyword", "null_value": "NULL" }, "tags": { "type": "keyword", "null_value": "NULL" }, "comments": { "type": "text" }, "status": { "type": "integer" }, "oCR": { "type": "integer" }, "source": { "type": "integer" }, "seqNo": { "type": "keyword", "null_value": "NULL" }, "docDate": { "type": "date" }, "history": { "type": "keyword", "null_value": "NULL" }, "modifiedBy": { "type": "keyword", "null_value": "NULL" }, "createdBy": { "type": "keyword", "null_value": "NULL" }, "modifiedOn": { "type": "keyword", "null_value": "NULL" }, "createdOn": { "type": "keyword", "null_value": "NULL" }, "container": { "type": "keyword", "null_value": "NULL" }, "containerType": { "type": "keyword", "null_value": "NULL" }, "containerID": { "type": "keyword", "null_value": "NULL" }, "videoUrl": { "type": "keyword", "null_value": "NULL" } } } }
Please don't use the citation button but the code button.
The mapping you have is not coming from FSCrawler. Have a look at https://fscrawler.readthedocs.io/en/latest/admin/fs/elasticsearch.html#mappings
Thanks for assisting me. I've attached the zip file that contains mapping & config files. Please review it and let me know about the mistake.
Kind Regards, Rizwan
I see that you are renaming some fields
"field": "file.filename",
"target_field": "storageName",
file.filename
must be kept as is at is used then by the crawler.
You can copy the content to storageName
if you wish but don't remove it.
Also, you probably need to remove the existing index before starting again otherwise the index template won't be applied.
Many thanks for correcting my issues, now my files are successfully indexed and I move my index mapping in _settings.json
file, and use set
in ingest pipe instead of rename
.
Now another issue I'm facing is that, fscrawler won't index existing files when I delete that index or re-create index with no mapping like PUT /indexname
.
Secondly, is there any rest api for reset fscrawler
or restart fscrawler
rather than doing it from CLI?
Kind Regards, Rizwan
You can manually remove the status file I think if you don't want to run FSCrawler with the --restart
option.
FSCrawler runs & crawl new documents initially when Index is empty, but after some time, it doesn't crawl any new document or new folder. And give me warning as below:
05:54:00,431 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [\ ServerIP\DocStorage\Folder1\BlobContainer\25107\25107\GL\20200408-PDF]... 05:54:00,431 TRACE [f.p.e.c.f.FsParserAbstract] Querying elasticsearch for files in dir [path.root:91d0d9e1c12b40118d1c233be55e7b6f] 05:54:00,446 TRACE [f.p.e.c.f.FsParserAbstract] Response [fr.pilato.elasticsearc h.crawler.fs.client.ESSearchResponse@452e241f] 05:54:00,446 WARN [f.p.e.c.f.FsParserAbstract] Can't find stored field name to check existing filenames in path [\ServerIP\DocStorage\Folder1\BlobContai ner\25107\25107\GL\20200408-PDF]. Please set store: true on field [file.filename ] 05:54:00,462 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling \ServerIP\DocStorage\Folder1\BlobContainer: Mapping is incorrect: please set stored: true on field [file.filename]. 05:54:00,462 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace java.lang.RuntimeException: Mapping is incorrect: please set stored: true on fie ld [file.filename]. at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory( FsParserAbstract.java:374) ~[fscrawler-core-2.7-SNAPSHOT.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursive ly(FsParserAbstract.java:309) ~[fscrawler-core-2.7-SNAPSHOT.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursive ly(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursive ly(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursive ly(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursive ly(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstr act.java:149) [fscrawler-core-2.7-SNAPSHOT.jar:?] at java.lang.Thread.run(Thread.java:830) [?:?] 05:54:00,477 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is going to sleep for 1m
Versions: