dadoonet / fscrawler

Elasticsearch File System Crawler (FS Crawler)
https://fscrawler.readthedocs.io/
Apache License 2.0
1.34k stars 297 forks source link

Please set store: true on field #937

Open Neel-Gagan opened 4 years ago

Neel-Gagan commented 4 years ago

FScrawler Version-2.6 ES verson 6.8.0. i have migrated my ES to a new system, keeping the version and other configuration same. while crawling a folder few of the subfolders a getting crawled but it throws "Please set store: true on field " error in midway of crawling. stored: true is already set in the default mapping of crawler.

], filters = null
14:15:27,169 DEBUG [f.p.e.c.f.FsParserAbstract] Indexing backup_2020/_doc/8fb99c77144577c5d328a749b2eaf?pipeline=null
14:15:27,169 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [G:\BACKUP\2020\Feb 2020\Data]...
14:15:27,403 WARN  [f.p.e.c.f.FsParserAbstract] Can't find stored field name to check existing filenames in path [G:\BACKUP\2020\Feb 2020\Data]. Please set store: true on field [file.filename]
14:15:27,403 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling F:\BACKUP\2020: Mapping is incorrect: please set stored: true on field [file.filename].
14:15:27,403 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:382) ~[fscrawler-core-2.6.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:317) ~[fscrawler-core-2.6.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:157) [fscrawler-core-2.6.jar:?]
        at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
14:15:27,419 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
14:15:27,481 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [backup_2020]
14:15:27,481 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
14:15:27,481 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
14:15:27,715 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
14:15:27,715 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [backup_2020] stopped
14:15:27,715 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [backup_2020]
14:15:27,715 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
14:15:27,715 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
14:15:27,715 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
14:15:27,715 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [backup_2020] stopped
dadoonet commented 4 years ago

Could you run:

GET backup_2020/_mapping

Also please use 2.7-SNAPSHOT

Neel-Gagan commented 4 years ago

The mapping shows stored :"true" in _mappings.json file

"file": {
          "properties": {
            "content_type": {
              "type": "keyword"
            },
            "filename": {
              "type": "keyword",
              "store": true
            },

2) Tried using snapshot 2.7 but the exclude and indexed_chars ="-1" setting made by us in version 2.6 is not compatible with the snapshot 2.7 versions .yaml file. Everytime we do the changes it asks for creating a new job file.

dadoonet commented 4 years ago

1) Did you run GET backup_2020/_mapping on the live cluster? Could you share the full output please?

2) Could you share the new settings.yaml? Note that the settings.json should be still supported.

Neel-Gagan commented 4 years ago

GET backup_2020/_mapping

#! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in get indices requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions.
{
  "backup_2020" : {
    "aliases" : { },
    "mappings" : {
      "doc" : {
        "properties" : {
          "attachment" : {
            "type" : "binary"
          },
          "attributes" : {
            "properties" : {
              "group" : {
                "type" : "keyword"
              },
              "owner" : {
                "type" : "keyword"
              }
            }
          },
          "content" : {
            "type" : "text",
            "analyzer" : "autocomplete",
            "search_analyzer" : "autocomplete_search"
          },
          "file" : {
            "properties" : {
              "checksum" : {
                "type" : "keyword"
              },
              "content_type" : {
                "type" : "keyword"
              },
              "created" : {
                "type" : "date"
              },
              "extension" : {
                "type" : "keyword"
              },
              "filename" : {
                "type" : "keyword",
                "store": "true"
              },
              "filesize" : {
                "type" : "long"
              },
              "indexed_chars" : {
                "type" : "long"
              },
              "indexing_date" : {
                "type" : "date",
                "format" : "dateOptionalTime"
              },
              "last_accessed" : {
                "type" : "date"
              },
              "last_modified" : {
                "type" : "date",
                "format" : "dateOptionalTime"
              },
              "url" : {
                "type" : "keyword",
                "index" : false
              }
            }
          },
          "meta" : {
            "properties" : {
              "altitude" : {
                "type" : "text"
              },
              "author" : {
                "type" : "text"
              },
              "comments" : {
                "type" : "text"
              },
              "contributor" : {
                "type" : "text"
              },
              "coverage" : {
                "type" : "text"
              },
              "created" : {
                "type" : "date",
                "format" : "dateOptionalTime"
              },
              "creator_tool" : {
                "type" : "keyword"
              },
              "date" : {
                "type" : "date",
                "format" : "dateOptionalTime"
              },
              "description" : {
                "type" : "text"
              },
              "format" : {
                "type" : "text"
              },
              "identifier" : {
                "type" : "text"
              },
              "keywords" : {
                "type" : "text"
              },
              "language" : {
                "type" : "keyword"
              },
              "latitude" : {
                "type" : "text"
              },
              "longitude" : {
                "type" : "text"
              },
              "metadata_date" : {
                "type" : "date",
                "format" : "dateOptionalTime"
              },
              "modifier" : {
                "type" : "text"
              },
              "print_date" : {
                "type" : "date",
                "format" : "dateOptionalTime"
              },
              "publisher" : {
                "type" : "text"
              },
              "rating" : {
                "type" : "byte"
              },
              "raw" : {
                "properties" : {
                  "Author" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Content-Type" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Creation-Date" : {
                    "type" : "date"
                  },
                  "Last-Modified" : {
                    "type" : "date"
                  },
                  "Last-Save-Date" : {
                    "type" : "date"
                  },
                  "Message-Cc" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message-From" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message-Recipient-Address" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message-To" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:CC-Display-Name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:CC-Email" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:CC-Name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:From-Email" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:From-Name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Accept-Language" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Auto-Submitted" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:CC" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Content-Language" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Content-Transfer-Encoding" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Content-Type" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Date" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Disposition-Notification-To" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:From" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Importance" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:In-Reply-To" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:MIME-Version" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Message-ID" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Received" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:References" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Return-Path" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Return-Receipt-To" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Subject" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Thread-Index" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:Thread-Topic" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:To" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-AVStamp-Enterprise" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-AuthAs" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-AuthMechanism" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-AuthSource" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-MessageDirectionality" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-Network-Message-Id" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-Recipient-P2-Type" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Organization-SCL" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Exchange-Transport-EndToEndLatency" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-Has-Attach" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-MS-TNEF-Correlator" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-Originating-IP" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:Raw-Header:X-Priority" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:To-Display-Name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:To-Email" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "Message:To-Name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "X-Parsed-By" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "creator" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "date" : {
                    "type" : "date"
                  },
                  "dc:creator" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "dc:description" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "dc:title" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "dcterms:created" : {
                    "type" : "date"
                  },
                  "dcterms:modified" : {
                    "type" : "date"
                  },
                  "meta:author" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "meta:creation-date" : {
                    "type" : "date"
                  },
                  "meta:mapi-from-representing-email" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "meta:mapi-from-representing-name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "meta:mapi-message-class" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "meta:mapi-sent-by-server-type" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "meta:save-date" : {
                    "type" : "date"
                  },
                  "modified" : {
                    "type" : "date"
                  },
                  "resourceName" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "subject" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "title" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  }
                }
              },
              "relation" : {
                "type" : "text"
              },
              "rights" : {
                "type" : "text"
              },
              "source" : {
                "type" : "text"
              },
              "title" : {
                "type" : "text"
              },
              "type" : {
                "type" : "text"
              }
            }
          },
          "path" : {
            "properties" : {
              "real" : {
                "type" : "keyword",
                "fields" : {
                  "tree" : {
                    "type" : "text",
                    "analyzer" : "fscrawler_path",
                    "fielddata" : true
                  }
                }
              },
              "root" : {
                "type" : "keyword"
              },
              "virtual" : {
                "type" : "keyword",
                "fields" : {
                  "tree" : {
                    "type" : "text",
                    "analyzer" : "fscrawler_path",
                    "fielddata" : true
                  }
                }
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "mapping" : {
          "total_fields" : {
            "limit" : "2000"
          }
        },
        "number_of_shards" : "5",
        "provided_name" : "backup_2020",
        "creation_date" : "1580880896555",
        "analysis" : {
          "analyzer" : {
            "autocomplete" : {
              "filter" : [
                "lowercase"
              ],
              "tokenizer" : "autocomplete"
            },
            "autocomplete_search" : {
              "tokenizer" : "lowercase"
            },
            "fscrawler_path" : {
              "tokenizer" : "fscrawler_path"
            }
          },
          "tokenizer" : {
            "autocomplete" : {
              "token_chars" : [
                "letter"
              ],
              "min_gram" : "2",
              "type" : "edge_ngram",
              "max_gram" : "20"
            },
            "fscrawler_path" : {
              "type" : "path_hierarchy"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "qdIeMZZmT3aUDQW-L2Xkww",
        "version" : {
          "created" : "6080099"
        }
      }
    }
  }
}

content of settings.yaml

---
name: "f_test"
fs:
  url: "F:\\Backup_2020"
  update_rate: "15m"
  excludes:
  -     "*/*.caf"
  -     "*/*.css"
  -     "*/*.js"
  -     "*/*.eot"
  -     "*/*.svg"
  -     "*/*.ttf"
  -     "*/*.woff"
  -     "*/*.woff2"
  -     "*/*.opus"
  -     "*/*.mp3"       
  -     "*/*.mp4"
  -     "*/*.vob"
  -     "*/*.js"
  -     "*/*.css"
  -     "*/*.svg"
  -     "*/*.dll"
  -         "*/*.exe"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: true
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  indexed_chars : "-1"
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
elasticsearch:
  nodes:
  - url: "http://127.0.0.1:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
rest:
  url: "http://127.0.0.1:8080/fscrawler"
dadoonet commented 4 years ago

Please format your code with something like:

```
CODE
```

Could you share the FSCrawler logs with this 2.7 configuration?

dadoonet commented 4 years ago

Also note that indentation might be wrong here:

  excludes:
  -     "*/*.caf"
  -     "*/*.css"
  -     "*/*.js"
  -     "*/*.eot"
  -     "*/*.svg"
  -     "*/*.ttf"
  -     "*/*.woff"
  -     "*/*.woff2"
  -     "*/*.opus"
  -     "*/*.mp3"       
  -     "*/*.mp4"
  -     "*/*.vob"
  -     "*/*.js"
  -     "*/*.css"
  -     "*/*.svg"
  -     "*/*.dll"
  -         "*/*.exe"
Neel-Gagan commented 4 years ago

tried with correct indentation also but every time it is asking for creating a new job file.

C:\ELK\fscrawler-2.7\bin>fscrawler f_test
16:37:25,025 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [f_test] does not exist
16:37:25,025 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
y
16:37:27,902 INFO  [f.p.e.c.f.c.FsCrawlerCli] Settings have been created in [C:\Users\Dell\.fscrawler\f_test\_settings.yaml]. Please review and edit before relaunch

C:\ELK\fscrawler-2.7\bin>fscrawler f_test
16:40:28,596 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [f_test] does not exist
16:40:28,596 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
n
dadoonet commented 4 years ago

Could you run:

fscrawler f_test --trace
dadoonet commented 4 years ago

There is something weird C:\Users\Dell.fscrawler\f_test_settings.yaml. It should be C:\Users\Dell\.fscrawler\f_test_settings.yaml

So you might fix that by using the --config_dir option. https://fscrawler.readthedocs.io/en/latest/admin/cli-options.html#cli-options

Neel-Gagan commented 4 years ago
fscrawler f_test --trace gives the same error as mentioned below: 
C:\ELK\fscrawler-2.7\bin>fscrawler f_test --trace
15:307:20,021 WARN [f.p.e.c.f.c.FsCrawlerCli] job [f_test] does not exist
15:30:20,021 INFO [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
y
dadoonet commented 4 years ago

But I'd have expected more traces.

dadoonet commented 4 years ago

Did you try my suggestion ?

Neel-Gagan commented 4 years ago

I am running fscrawler from the default location. Everytime i do changes in yaml file. It asks for creating a new job file.

dadoonet commented 4 years ago

But did you try my suggestion ?

I know what the problem is.

Neel-Gagan commented 4 years ago

with --config_dir option the error remains the same.

dadoonet commented 4 years ago

Could you share the command line and the full output?

Neel-Gagan commented 4 years ago
11:39:11,882 DEBUG [f.p.e.c.f.FsParser] Indexing f_test/_doc/b2b1819ec8245e1
588e8f252b3172822?pipeline=null
11:39:11,898 DEBUG [f.p.e.c.f.FsParser] Looking for removed files in [F:\Backup_2020\AL
PHA\2\1020,DATA\8c2ba005f32f8e3b7434ece773b67e40\1020\02-12-17\Feedback]..
.
11:39:11,961 WARN  [f.p.e.c.f.FsParser] Can't find stored field name to check ex
isting filenames in path [F:\Backup_2020\ALPHA\2\1020,DATA\8c2ba005f32f8e3b7434ec
e773b67e40\1020\02-12-17\Feedback]. Please set store: true on field [file.filena
me]
11:39:11,961 WARN  [f.p.e.c.f.FsParser] Error while crawling F:\Backup_2020\ALPHA\2\102
0,DATA: Mapping is incorrect: please set stored: true on field [file.filen
ame].
11:39:11,961 WARN  [f.p.e.c.f.FsParser] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on fie
ld [file.filename].
        at fr.pilato.elasticsearch.crawler.fs.FsParser.getFileDirectory(FsParser
.java:375) ~[fscrawler-core-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsPar
ser.java:307) ~[fscrawler-core-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsPar
ser.java:290) ~[fscrawler-core-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsPar
ser.java:290) ~[fscrawler-core-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsPar
ser.java:290) ~[fscrawler-core-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsPar
ser.java:290) ~[fscrawler-core-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParser.run(FsParser.java:167) [f
scrawler-core-2.5.jar:?]
        at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
11:39:11,961 INFO  [f.p.e.c.f.FsParser] FS crawler is stopping after 1 run
11:39:12,070 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_test]
11:39:12,070 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
11:39:12,070 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearc
h client manager
11:39:12,382 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
11:39:12,382 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
11:39:12,382 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_test] stopped
11:39:12,382 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_test]
11:39:12,382 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
11:39:12,382 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearc
h client manager
11:39:12,382 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
11:39:12,398 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
11:39:12,398 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_test] stopped
dadoonet commented 4 years ago

Please format your code/logs.

Have a look at this discussion which fixes a similar problem: https://discuss.elastic.co/t/fscrawler-error-while-crawling-invalid-utf-8-start-byte-0xb5/230096/5?u=dadoonet

Neel-Gagan commented 4 years ago

The link suggest deleting the index, but my index has quite a lot of data. is there any other way to sort this issue without deleting the index?

and with FScrawler 2.7 Snanpshot, can i run my existing _settings.json from Fscrawler 2.6 in 2.7 Snapshot version ?

Neel-Gagan commented 4 years ago

As per suggestion created a new index after deleting existing one, but still the issue persists. it gave set stored.filename: true error in midway of crawling folders. trace command is shown below :

14:36:36,772 TRACE [f.p.e.c.f.f.FsCrawlerUtil] No pattern always matches.
14:36:36,776 DEBUG [f.p.e.c.f.FsParserAbstract] Indexing f_mi/_doc/227b6de67262754f19e7c997985cb4?pipeline=null
14:36:36,776 TRACE [f.p.e.c.f.FsParserAbstract] JSon indexed : {
  "content" : "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "meta" : {
    "raw" : {
      "X-Parsed-By" : "org.apache.tika.parser.DefaultParser",
      "resourceName" : "TEST_FILE.zip",
      "Content-Type" : "application/zip"
    }
  },
  "file" : {
    "extension" : "zip",
    "content_type" : "application/zip",
    "created" : "2020-04-09T12:29:09.443+0000",
    "last_modified" : "2018-02-12T14:00:27.061+0000",
    "last_accessed" : "2020-04-09T12:29:09.506+0000",
    "indexing_date" : "2020-05-05T09:06:04.642+0000",
    "filesize" : 2371862,
    "filename" : "TEST_FILE.zip",
    "url" : "file://F:\\Test data\\TEST_FILE.zip"
  },
  "path" : {
    "root" : "d8d743596652383a1cd04ade29516ff7",
    "virtual" : "/Test data/TEST_FILE.zip",
    "real" : "F:\\Test data\\TEST_FILE.zip"
  }
}
14:36:36,776 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [F:\Test data\]...
14:36:36,776 TRACE [f.p.e.c.f.FsParserAbstract] Querying elasticsearch for files in dir [path.root:d8d743596652383a1cd04ade29516ff7]
14:36:36,795 TRACE [f.p.e.c.f.FsParserAbstract] Response [fr.pilato.elasticsearch.crawler.fs.client.ESSearchResponse@50713e36]
14:36:36,795 WARN  [f.p.e.c.f.FsParserAbstract] Can't find stored field name to check existing filenames in path [F:\Test data\]. Please set store: true on field [file.filename]
14:36:36,795 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling F:\MI: Mapping is incorrect: please set stored: true on field [file.filename].
14:36:36,795 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:382) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:317) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:157) [fscrawler-core-2.6.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
14:36:36,799 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
14:36:36,842 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
14:36:36,842 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
14:36:36,842 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
14:36:36,842 TRACE [f.p.e.c.f.c.v.ElasticsearchClientV6] Sending a bulk request of [1] requests
14:36:36,908 TRACE [f.p.e.c.f.c.v.ElasticsearchClientV6] Executed bulk request with [1] requests
14:36:36,912 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
14:36:36,912 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
14:36:36,912 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
14:36:36,912 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
14:36:36,912 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
14:36:36,912 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
14:36:36,912 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
dadoonet commented 4 years ago

That's because the mapping for folders or content is incorrect.

What is the output of:

GET /f_mi/_mapping
GET /f_mi_folder/_mapping

Please, format your code and logs with markdown or the <> button.

Neel-Gagan commented 4 years ago

mapping .txt

attached is the mapping file.

dadoonet commented 4 years ago

Please format the code. Edit your answer. Thanks.

Neel-Gagan commented 4 years ago

a gentle reminder regarding this request

Neel-Gagan commented 4 years ago

find the attached trace command for the crawler job f_mi.. still facing the error when created new index after deleting the existing one. can't find stored field name to check existing filenames in path [F:\Test Data]. Please set store: true on field [file.filename]

18:40:27,539 TRACE [f.p.e.c.f.f.FsCrawlerUtil] No pattern always matches.
18:40:27,543 DEBUG [f.p.e.c.f.FsParserAbstract] Indexing f_mi/_doc/c3d2a3eb3651898a7729c2414317a25f?pipeline=null
18:40:27,543 TRACE [f.p.e.c.f.FsParserAbstract] JSon indexed : {
  "content" : "um 11\n\nI u AX m; «m IXIHQS‘A)\n\n \n\nNO 50 m ?m Bun\n\n   \n\n \n\nm ”0.. Hum) mom m» w,\n\nmu .u mm\nMy mum leUnUu Muuu\n\ngogpsvrgms\n\n \n\n \n\nmm mm ON uu J/m 201», Q EARA\nv0 7mm nv H5 lcr. Mammy\n\n‘H H maw ,I‘Ur‘ Mum ”Jaw,\nm ‘ Maw m ‘LM/nt} ,er mm; mm \"7 v :-\n\n     \n\n \n\nw m \\hrfl n: mm m rm ha, 2:,\n1mm rm Wurst. ms 1mm W \",2 law,\nwmmxwmv mm mumumwmamhuw-r v1\" -\n\nu mmm: 1m ptulhs 0H may m be 9m 10' e ,\n~- u m Mum am back m mc wwm ,v.\n\n         \n\nm yam unccssaw and cany mm D'\n\n \n\n\n",
  "meta" : {
    "date" : "2018-01-24T09:08:55.000+0000",
    "created" : "2018-01-24T09:08:55.000+0000",
    "raw" : {
      "date" : "2018-01-24T14:38:55",
      "Number of Tables" : "4 Huffman tables",
      "Compression Type" : "Baseline",
      "Number of Components" : "3",
      "Component 2" : "Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert",
      "Focal Length" : "4.7 mm",
      "Component 1" : "Y component: Quantization table 0, Sampling factors 2 horiz/2 vert",
      "X Resolution" : "1 dot",
      "tiff:Make" : "Motorola",
      "Component 3" : "Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert",
      "F-Number" : "f/2.0",
      "modified" : "2018-01-24T14:38:55",
      "tiff:BitsPerSample" : "8",
      "meta:creation-date" : "2018-01-24T14:38:55",
      "exif:FNumber" : "2.0",
      "Exposure Time" : "299/10000 sec",
      "Creation-Date" : "2018-01-24T14:38:55",
      "ISO Speed Ratings" : "100",
      "resourceName" : "media_upload1_1516768110933.jpg",
      "Make" : "Motorola",
      "Orientation" : "Unknown (0)",
      "tiff:Orientation" : "0",
      "exif:FocalLength" : "4.67",
      "Y Resolution" : "1 dot",
      "Data Precision" : "8 bits",
      "White Balance" : "Unknown",
      "tiff:ImageLength" : "960",
      "Thumbnail Height Pixels" : "0",
      "dcterms:created" : "2018-01-24T14:38:55",
      "dcterms:modified" : "2018-01-24T14:38:55",
      "Last-Modified" : "2018-01-24T14:38:55",
      "exif:Flash" : "false",
      "exif:ExposureTime" : "0.0299",
      "Last-Save-Date" : "2018-01-24T14:38:55",
      "File Size" : "57521 bytes",
      "meta:save-date" : "2018-01-24T14:38:55",
      "File Name" : "apache-tika-1572764703782830790.tmp",
      "Flash" : "Flash did not fire, auto",
      "Content-Type" : "image/jpeg",
      "X-Parsed-By" : "org.apache.tika.parser.DefaultParser",
      "Resolution Units" : "none",
      "File Modified Date" : "Tue May 26 18:40:24 +05:30 2020",
      "Date/Time" : "2018:01:24 09:08:55",
      "Image Height" : "960 pixels",
      "Thumbnail Width Pixels" : "0",
      "Image Width" : "540 pixels",
      "tiff:Model" : "XT1562",
      "exif:IsoSpeedRatings" : "100",
      "Model" : "XT1562",
      "tiff:ImageWidth" : "540",
      "White Balance Mode" : "Auto white balance"
    }
  },
  "file" : {
    "extension" : "jpg",
    "content_type" : "image/jpeg",
    "created" : "2020-05-05T11:33:01.080+0000",
    "last_modified" : "2018-01-23T22:58:32.000+0000",
    "last_accessed" : "2020-05-05T11:33:01.080+0000",
    "indexing_date" : "2020-05-26T13:10:24.084+0000",
    "filesize" : 57521,
    "filename" : "media_upload1_1516768110933.jpg",
    "url" : "file://F:\\Test Data\\media_upload1_1516768110933.jpg"
  },
  "path" : {
    "root" : "42bb3fa5235a9bb7ca221884627d49de",
    "virtual" : "/Test Data/media_upload1_1516768110933.jpg",
    "real" : "F:\\Test Data\\media_upload1_1516768110933.jpg"
  }
}
18:40:27,543 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(F:\Test Data, F:\Test Data\Metadata of Images.xlsx) = /Test Data/Metadata of Images.xlsx
18:40:27,543 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [false], filename = [/Test Data/Metadata of Images.xlsx], includes = [null], excludes = [[*/~*]]
18:40:27,543 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/Test Data/Metadata of Images.xlsx], excludes = [[*/~*]]
18:40:27,543 TRACE [f.p.e.c.f.f.FsCrawlerUtil] regex is [.*?/~.*?]
18:40:27,543 TRACE [f.p.e.c.f.f.FsCrawlerUtil] does not match any exclude pattern
18:40:27,543 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/Test Data/Metadata of Images.xlsx], includes = [null]
18:40:27,543 TRACE [f.p.e.c.f.f.FsCrawlerUtil] no include rules
18:40:27,543 DEBUG [f.p.e.c.f.FsParserAbstract] [/Test Data/Metadata of Images.xlsx] can be indexed: [true]
18:40:27,543 DEBUG [f.p.e.c.f.FsParserAbstract]   - file: /Test Data/Metadata of Images.xlsx
18:40:27,543 DEBUG [f.p.e.c.f.FsParserAbstract] fetching content from [F:\Test Data],[Metadata of Images.xlsx]
18:40:27,543 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(F:\Test Data, F:\Test Data\Metadata of Images.xlsx) = /Test Data/Metadata of Images.xlsx
18:40:27,543 TRACE [f.p.e.c.f.t.TikaDocParser] Generating document [Metadata of Images.xlsx]
18:40:27,543 TRACE [f.p.e.c.f.t.TikaDocParser] indexed chars [has been disabled. All text will be extracted]
18:40:27,543 TRACE [f.p.e.c.f.t.TikaDocParser] Beginning Tika extraction
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser] End of Tika extraction
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser] Listing all available metadata:
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw.entrySet(), iterableWithSize(24));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("date", "2020-05-05T07:38:54Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("X-Parsed-By", "org.apache.tika.parser.DefaultParser"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("creator", "Admin"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("extended-properties:AppVersion", "16.0300"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("meta:author", "Admin"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("meta:creation-date", "2018-02-12T09:49:30Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("extended-properties:Application", "Microsoft Excel"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("meta:last-author", "Admin"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("dc:creator", "Admin"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Creation-Date", "2018-02-12T09:49:30Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("resourceName", "Metadata of Images.xlsx"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("dcterms:created", "2018-02-12T09:49:30Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Last-Author", "Admin"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("dcterms:modified", "2020-05-05T07:38:54Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Last-Modified", "2020-05-05T07:38:54Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("X-TIKA:origResourceName", "C:\Users\Admin\Desktop\"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Last-Save-Date", "2020-05-05T07:38:54Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Application-Version", "16.0300"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("protected", "false"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("meta:save-date", "2020-05-05T07:38:54Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Application-Name", "Microsoft Excel"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Author", "Admin"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("modified", "2020-05-05T07:38:54Z"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser]   assertThat(raw, hasEntry("Content-Type", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"));
18:40:27,551 TRACE [f.p.e.c.f.t.TikaDocParser] End document generation
18:40:27,551 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] content = [Paratrooper
    Sl No.  Image File name shared by Admin Corresponding recovered Image File name Date of Modification    Time
    1   26940974_1993515777575503_438438715_n.jpg       1/23/18 16:44
    2   26941082_1993515774242170_1410228234_n.jpg      1/18/18 4:17
    3   26914341_1993515710908843_103548795_n.jpg       1/18/18 4:36
    4   26943366_1993515840908830_799198527_n.jpg       1/16/18 13:53
    5   26940974_1993515777575503_438438715_n.jpg       1/18/18 3:54
    6   26940874_1993561860904228_941630702_n.jpg       1/18/18 5:28
    7   26943421_1993560964237651_1864589668_n.jpg      1/24/18 3:39
    8   shakti_2018.jpeg        1/25/18 6:10
    9   Image_2018-01-25_at_00.16.01.jpeg       1/24/18 4:28

], filters = null
18:40:27,551 TRACE [f.p.e.c.f.f.FsCrawlerUtil] No pattern always matches.
18:40:27,551 DEBUG [f.p.e.c.f.FsParserAbstract] Indexing f_mi/_doc/f56296e8c811f8c05da284b4f5d4cdb5?pipeline=null
18:40:27,551 TRACE [f.p.e.c.f.FsParserAbstract] JSon indexed : {
  "content" : "Paratrooper\n\tSl No.\tImage File name shared \tCorresponding recovered Image File name\tDate of Modification\tTime\n\t1\t26940974_1993515777575503_438438715_n.jpg\t\t1/23/18\t16:44\n\t2\t26941082_1993515774242170_1410228234_n.jpg\t\t1/18/18\t4:17\n\t3\t26914341_1993515710908843_103548795_n.jpg\t\t1/18/18\t4:36\n\t4\t26943366_1993515840908830_799198527_n.jpg\t\t1/16/18\t13:53\n\t5\t26940974_1993515777575503_438438715_n.jpg\t\t1/18/18\t3:54\n\t6\t26940874_1993561860904228_941630702_n.jpg\t\t1/18/18\t5:28\n\t7\t26943421_1993560964237651_1864589668_n.jpg\t\t1/24/18\t3:39\n\t8\tgagan_shakti_2018.jpeg\t\t1/25/18\t6:10\n\t9\Image_2018-01-25_at_00.16.01.jpeg\t\t1/24/18\t4:28\n\n\n",
  "meta" : {
    "author" : "Admin",
    "date" : "2020-05-05T02:08:54.000+0000",
    "modifier" : "Admin",
    "created" : "2018-02-12T04:19:30.000+0000",
    "raw" : {
      "date" : "2020-05-05T07:38:54Z",
      "X-Parsed-By" : "org.apache.tika.parser.DefaultParser",
      "creator" : "Admin",
      "extended-properties:AppVersion" : "16.0300",
      "meta:author" : "Admin",
      "meta:creation-date" : "2018-02-12T09:49:30Z",
      "extended-properties:Application" : "Microsoft Excel",
      "meta:last-author" : "Admin",
      "dc:creator" : "Admin",
      "Creation-Date" : "2018-02-12T09:49:30Z",
      "resourceName" : "Metadata of Images.xlsx",
      "dcterms:created" : "2018-02-12T09:49:30Z",
      "Last-Author" : "Admin",
      "dcterms:modified" : "2020-05-05T07:38:54Z",
      "Last-Modified" : "2020-05-05T07:38:54Z",
      "X-TIKA:origResourceName" : "C:\\Users\\Admin\\Desktop\\",
      "Last-Save-Date" : "2020-05-05T07:38:54Z",
      "Application-Version" : "16.0300",
      "protected" : "false",
      "meta:save-date" : "2020-05-05T07:38:54Z",
      "Application-Name" : "Microsoft Excel",
      "Author" : "Admin",
      "modified" : "2020-05-05T07:38:54Z",
      "Content-Type" : "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
    }
  },
  "file" : {
    "extension" : "xlsx",
    "content_type" : "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "created" : "2020-05-05T11:33:01.087+0000",
    "last_modified" : "2020-05-05T07:38:54.860+0000",
    "last_accessed" : "2020-05-05T11:33:01.095+0000",
    "indexing_date" : "2020-05-26T13:10:27.543+0000",
    "filesize" : 11676,
    "filename" : "Metadata of Images.xlsx",
    "url" : "file://F:\\Test Data\\Metadata of Images.xlsx"
  },
  "path" : {
    "root" : "42bb3fa5235a9bb7ca221884627d49de",
    "virtual" : "/Test Data/Metadata of Images.xlsx",
    "real" : "F:\\Test Data\\Metadata of Images.xlsx"
  }
}
18:40:27,551 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [F:\Test Data]...
18:40:27,551 TRACE [f.p.e.c.f.FsParserAbstract] Querying elasticsearch for files in dir [path.root:42bb3fa5235a9bb7ca221884627d49de]
18:40:27,551 TRACE [f.p.e.c.f.FsParserAbstract] Response [fr.pilato.elasticsearch.crawler.fs.client.ESSearchResponse@dd4c512]
18:40:27,551 WARN  [f.p.e.c.f.FsParserAbstract] Can't find stored field name to check existing filenames in path [F:\Test Data]. Please set store: true on field [file.filename]
18:40:27,551 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling F:\MI: Mapping is incorrect: please set stored: true on field [file.filename].
18:40:27,551 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:382) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:317) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:157) [fscrawler-core-2.6.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
18:40:27,551 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is going to sleep for 15m
18:40:31,069 TRACE [f.p.e.c.f.c.v.ElasticsearchClientV6] Sending a bulk request of [2] requests
18:40:31,154 TRACE [f.p.e.c.f.c.v.ElasticsearchClientV6] Executed bulk request with [2] requests
dadoonet commented 4 years ago

Please format your code and logs. I'm going to explain you how to do this. Look at this block for example:

18:40:27,551 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling F:\MI: Mapping is incorrect: please set stored: true on field [file.filename]. 18:40:27,551 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename]. at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:382) ~[fscrawler-core-2.6.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:317) ~[fscrawler-core-2.6.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?] at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:157) [fscrawler-core-2.6.jar:?] at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]

It's ugly right?

Format it using this:

```
18:40:27,551 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling F:\MI: Mapping is incorrect: please set stored: true on field [file.filename].
18:40:27,551 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
  at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:382) ~[fscrawler-core-2.6.jar:?]
  at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:317) ~[fscrawler-core-2.6.jar:?]
  at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
  at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
  at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
  at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:157) [fscrawler-core-2.6.jar:?]
  at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
```

It will look like this:

18:40:27,551 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling F:\MI: Mapping is incorrect: please set stored: true on field [file.filename].
18:40:27,551 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:382) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:317) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:299) ~[fscrawler-core-2.6.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:157) [fscrawler-core-2.6.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]

Please edit your past comment.

Also, switch to 2.7.

Neel-Gagan commented 4 years ago

I switched to FS Cralwer 2.7 and facing the below mentioned issues.

for the below mentioned job file i get the following error

_settings.yaml file

name: "fscrawler_7"
fs:
  url: "C://Users//Dell//Desktop//COMMANDS"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: true
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
elasticsearch:
  nodes:
  - url: "http://192.X.X.X:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
rest:
  url: "http://127.0.0.1:8080/fscrawler"

Error :

C:\ELK\fscrawler-es6-2.7\binfscrawler fscrawler_7 --debug --loop 1 

with default setting by just changing the url , the crawling stops midway after crawling few file giving the below mentioned error:

12:32:24,772 DEBUG [f.p.e.c.f.FsParserAbstract] Indexing fscrawler_7/9fb05f1a95d13786cff3a89e8fde820?pipeline=null
12:32:24,772 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [C:\Users\Dell\Desktop\COMMANDS\New folder\Packages\Packages\apache-code\admin\global_assets\images\backgrounds]...
12:32:24,804 WARN  [f.p.e.c.f.FsParserAbstract] Can't find stored field name to check existing filenames in path [C:\Users\Dell\Desktop\COMMANDS\New folder\Packages\Packages\apache-code\admin\global_assets\images\backgrounds]. Please set store: true on field [file.filename]
12:32:24,804 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling C://Users//Dell//Desktop//COMMANDS: Mapping is incorrect: please set stored: true on field [file.filename].
12:32:24,804 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:374) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:309) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:149) [fscrawler-core-2.7-SNAPSHOT.jar:?]
        at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
12:32:24,804 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
12:32:24,913 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [fscrawler_7]
12:32:24,913 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
12:32:24,913 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
12:32:25,273 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
12:32:25,273 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [fscrawler_7] stopped
12:32:25,273 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [fscrawler_7]
12:32:25,288 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
12:32:25,288 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
12:32:25,288 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
12:32:25,288 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [fscrawler_7] stopped

when i do changes in _settings.yaml by including exclude and indexed_chars : -1 in the job file fscrawler7 and when i include settings of my previous _settings.json file i get the prompt to create new job file

name: "fscrawler_7_"
fs:
  url: "C://Users//Dell//Desktop//COMMANDS"
  update_rate: "15m"
  excludes:
   - "*/*.caf"
  - "*/*.css"
  - "*/*.js"
  - "*/*.eot"
  - "*/*.svg"
  - "*/*.ttf"
  - "*/*.woff"
  - "*/*.woff2"
  - "*/*.opus"
  - "*/*.mp3"       
  - "*/*.mp4"
  - "*/*.vob"
  - "*/*.js"
  - "*/*.css"
  - "*/*.svg"
  - "*/*.dll"
  - "*/*.exe"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: true
  xml_support: false
  index_folders: true
  lang_detect: false
  indexed_chars : -1
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
elasticsearch:
  nodes:
  - url: "http://192.X.X.X:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
rest:
  url: "http://127.0.0.1:8080/fscrawler"

below is the error , it asks for creation of a new job

C:\ELK\fscrawler-es6-2.7\bin>fscrawler fscrawler_7_ --debug --loop 1
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [2/_settings.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [2/_settings_folder.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [5/_settings.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [5/_settings_folder.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
12:43:01,575 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [fscrawler_7_]...
12:43:01,762 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [fscrawler_7_] does not exist
12:43:01,762 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?

AND if i use my existing _settings.json file which was running in fscrawler 2.6 in fscrawler 2.7 i get the below mentioned error :

C:\ELK\fscrawler-es6-2.7\bin>fscrawler fscrawler_7_test_ --debug --loop 1
12:51:40,455 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [2/_settings.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [2/_settings_folder.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [5/_settings.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [5/_settings_folder.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
12:51:40,470 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [fscrawler_7_test_]...
12:51:40,470 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [fscrawler_7_test_] does not exist
12:51:40,470 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?

the issue of Mapping is incorrect: please set stored: true on field [file.filename]. still persists and it is coming midway between crawling. kindly guide what needs to be done to facilitate crawling which does not stop midway.

dadoonet commented 4 years ago

java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].

This means that the index is not correctly created. To solve this (warn: this will remove any existing data):

This should work and should not give the same error again. If it does, that seems to indicate that you provided your own mapping files or you changed the default index settings (7/_settings.json and 7/_settings_folder.json) or you have an elasticsearch index template.

Could you run this simple test above before trying anything else, like modifying the job settings? We will fix that after, once your installation is correct.

Neel-Gagan commented 4 years ago

I have deleted and started crawling again. i haven't changed anything in default mapping and the crawler is giving error midway between the crawling. i am puzzled where exactly is the issue.

dadoonet commented 4 years ago

Could you do it again? Share step by step the exact commands you are launching and the output you're getting.

Did you reinstall a clean new version of FSCrawler? Did you remove the existing files in ~/.fscrawler?

Neel-Gagan commented 4 years ago

on running a fresh installation of fscrawler 2.7 with a new job f_mi got the below mentioned error on crawling a database file of 1.8GB below is the trace file

 "file" : {
    "extension" : "accdb",
    "content_type" : "application/x-msaccess",
    "created" : "2018-07-09T09:22:54.911+0000",
    "last_modified" : "2018-08-01T08:47:11.035+0000",
    "last_accessed" : "2020-04-09T12:49:17.016+0000",
    "indexing_date" : "2020-06-09T08:02:00.993+0000",
    "filesize" : 1274957824,
    "filename" : "Database4.accdb",
    "url" : "file://F:\\Test Data\\Database4.accdb"
  },
  "path" : {
    "root" : "a9f7b81422814a76439be45c7e2281",
    "virtual" : "/Test Data/Database4.accdb",
    "real" : "F:\\Test Data\\Database4.accdb"
  }
}
13:33:07,150 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling F:\Test Data: integer overflow
13:33:07,150 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.ArithmeticException: integer overflow
    at java.lang.Math.multiplyExact(Unknown Source) ~[?:1.8.0_171]
    at org.apache.lucene.util.UnicodeUtil.maxUTF8Length(UnicodeUtil.java:618) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
    at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:84) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
    at org.elasticsearch.common.bytes.BytesArray.<init>(BytesArray.java:32) ~[elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:357) ~[elasticsearch-6.6.0.jar:6.6.0]
    at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.index(ElasticsearchClientV6.java:375) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.esIndex(FsParserAbstract.java:577) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.indexFile(FsParserAbstract.java:479) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:267) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:149) [fscrawler-core-2.7-SNAPSHOT.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
13:33:07,154 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
13:33:07,201 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
13:33:07,201 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
13:33:07,201 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
13:33:07,205 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
13:33:07,205 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
13:33:07,205 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
13:33:07,205 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
dadoonet commented 4 years ago

Ok. That's another error. Could you open a new issue with those details?

I think we can close the current issue, right?

Neel-Gagan commented 4 years ago

i removed the file giving the integer over flow error . still with rest of the files i am getting the same error with all fresh installation of FSCrawler2.7

11:26:50,408 DEBUG [f.p.e.c.f.FsParserAbstract] Indexing f_mi/9b42753aba728ca7a459bd75f31899?pipeline=null
11:26:50,408 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [F:\Test Data\2019]...
11:26:50,443 WARN  [f.p.e.c.f.FsParserAbstract] Can't find stored field name to check existing filenames in path [F:\Test Data\2019]. Please set store: true on field [file.filename]
11:26:50,443 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling F:\Test Data: Mapping is incorrect: please set stored: true on field [file.filename].
11:26:50,443 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: Mapping is incorrect: please set stored: true on field [file.filename].
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.getFileDirectory(FsParserAbstract.java:374) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:309) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:291) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:149) [fscrawler-core-2.7-SNAPSHOT.jar:?]
        at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
11:26:50,447 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
11:26:50,509 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
11:26:50,509 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
11:26:50,509 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
11:26:50,794 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
11:26:50,794 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
11:26:50,798 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [f_mi]
11:26:50,798 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
11:26:50,798 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
11:26:50,798 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
11:26:50,798 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [f_mi] stopped
Neel-Gagan commented 4 years ago

A gentle reminder regarding the raised issue . Even after starting everything from scratch for a clear envirnonment. still getting the Mapping is incorrect: please set stored: true on field in between crawling.

dadoonet commented 4 years ago

Even after starting everything from scratch for a clear envirnonment. still getting the Mapping is incorrect: please set stored: true on field in between crawling.

WDYM? Did you start a totally new Elasticsearch cluster? Did you clean the .fscrawler dir?

Neel-Gagan commented 4 years ago

i have cleaned the .fscrawler directory and deleted the index which was causing the error. so for new index also getting the same error.

dadoonet commented 4 years ago

Which index did you clean?

Neel-Gagan commented 4 years ago

i have deleted the f_mi index and ran the fscrawler with --restart option, it created a new index f_mi, but still the issue persists.

dadoonet commented 4 years ago

You did not delete as per my advice?

DELETE /fscrawler_7_*

In your case

DELETE /f_mi*

Because there also a folder index you need to remove