dadoonet / fscrawler

Elasticsearch File System Crawler (FS Crawler)
https://fscrawler.readthedocs.io/
Apache License 2.0
1.36k stars 298 forks source link
crawler elasticsearch java tika

File System Crawler for Elasticsearch

Welcome to the FS Crawler for Elasticsearch

This crawler helps to index binary documents such as PDF, Open Office, MS Office.

Main features:

Latest versions

Current "most stable" versions are:

Elasticsearch FS Crawler Released Docs
6.x, 7.x, 8.x 2.10-SNAPSHOT 2.10-SNAPSHOT

Maven Central GitHub Release Date Maven metadata URL GitHub last commit

Docker Pulls Docker Image Size (tag) Docker Image Version (latest semver)

Build and Quality Status

Build Documentation Status

Lines of Code Duplicated Lines (%) Maintainability Rating Technical Debt Reliability Rating

Vulnerabilities Bugs Quality Gate Status Code Smells Security Rating

GitHub stats

GitHub commits since latest release (by SemVer including pre-releases) GitHub commit activity (branch) GitHub contributors

GitHub issues GitHub pull requests

Documentation

The guide has been moved to ReadTheDocs.

X (formerly Twitter) Follow

Contribute

Works on my machine - and yours ! Spin up pre-configured, standardized dev environments of this repository, by clicking on the button below.

Open in Gitpod

License

GitHub

Read more about the Apache2 License.

Thanks

Thanks to JetBrains for the IntelliJ IDEA License!

Thanks to SonarCloud for the free analysis!

SonarCloud