Is your feature request related to a problem? Please describe.
We're building a crawler cluster for local area network. It intends to provide a convenient search service. People in there are using FTP or Windows Share Folder and they don't know how to trigger a file indexing procedure. I've checked the documentation and it said only if existing files are touched after they're moved in, then they can be indexed. Based on it, I think our service will be not convenient after some days or weeks.
Describe the solution you'd like
I've checked the core logic about crawler. There is a FsCrawlerManagementService which records all file directories. Also it provides a nice function called getFileDirectory. I think it's possible and very easy to do what I want. Please see this PR in our fork repo.
Implement this behavior with a feature flag (on by default) so we can always have the previous behavior if needed
change the hardcoded limit to a parameter with the default value of 10000 (it can go up to 100000)
get back from the response the information that with have more than the fixed limit and log warn that information by tracing the folder name and the number of documents we found
Is your feature request related to a problem? Please describe.
We're building a crawler cluster for local area network. It intends to provide a convenient search service. People in there are using FTP or Windows Share Folder and they don't know how to trigger a file indexing procedure. I've checked the documentation and it said only if existing files are touched after they're moved in, then they can be indexed. Based on it, I think our service will be not convenient after some days or weeks.
Describe the solution you'd like
I've checked the core logic about crawler. There is a
FsCrawlerManagementService
which records all file directories. Also it provides a nice function calledgetFileDirectory
. I think it's possible and very easy to do what I want. Please see this PR in our fork repo.https://github.com/waterstone-company/fscrawler/pull/34