Closed CorinneKnoe closed 4 years ago
Hello Corinne,
Yes, the Norconex Importer can split PST files. Have a look at GenericDocumentParserFactory. You will find many options for controlling exactly what/how you want the split embedded documents. In its most open form (split every embedded documents), it would look like this:
<importer>
<documentParserFactory class="com.norconex.importer.parser.GenericDocumentParserFactory">
<embedded>
<splitContentTypes>.*</splitContentTypes>
</embedded>
</documentParserFactory>
</importer>
You probably want to configure it so it only splits certain content types.
Thank you very much for your help and the code example. Will give this a try!
I am trying to split PST files (contain entire mailboxes) into its elements: emails, attachments, contacts, calendar entries, etc.
Norconex is able to read the PST file, however it returns the entire thing in one file. I tried to use DOM splitter to split the PST into components. So far, no success.
Is Norconex able to split PST files? Many thanks, Corinne