Open hpollock opened 5 years ago
I do not think there is an out-of-the-box way to do this. If you know your Java, here is a suggestion:
Implement a IFileDocumentProcessor
and add it as an entry under <postImportProcessors>
.
In your document processor, you will have a FileDocument
argument that will contain your file metadata and content. Get the path of the child document you want to merge. From that, use the FileSystemManager
argument to fetch it and call the Importer module explicitly to parse the target document and merge it yourself. Not the most trivial thing, but that is the only option that comes to mind right now.
Thanks for the quick response and suggested approach Pascal. We'll try that out.
I am marking this as a feature request to be able to merge content with another file.
A scenario we're looking to use the Filesystem Collector on is to crawl a collection of textual metadata files on the file system (one file per document) - we can use taggers in the preparsehandlers to extract this text as document metadata. However, each record can (though not always) reference an external file path to the actual document file which we'd want to undergo parsing by the document parser.
Is there an easy way through configuration to route this external document file to the parser for parsing so that the metadata record and document content are effectively combined?