Closed mnemonictrick closed 5 years ago
The description field (if present) should be extracted by the parsing done by the Importer module. That means it should definitely be a post-parse handler.
That being said, I see nothing obviously wrong with what you have. A few ideas:
DebugTagger
as the first element of your post-parse handlers to find out what fields are extracted. This should help you troubleshoot.If none of this helps, can you share your full config for further review? With a "faulty" PDF if possible in order to reproduce.
Hi essiembre,
thank you for your quick reply.
KeepOnlyTagger
configured. The description-field is included. But no matter if I disable the KeepOnlyTagger
or not, the results are the same.DebugTagger
enabled).But there is one (strange?) thing I discovered: While going through the output of the DebugTagger I noticed the following output:
MyProject: 2019-01-07 07:38:49 INFO - content=Content of description-field
The output (local testing with JSONFileCommitter) instead isn't changed:
... "doc-add": { "reference": ..., "metadata": "ObservableMap [map=ObservableMap [map=ObservableMap [map=....", "content": "Made by\n2,21\n1,96..." }
Is there any other hint you could give me? The PDF doesn't seem to be the problem...
Thank you so much!
Hi,
we couldn't manage to change the "content"-field. So we used the description field and added it to the submitted fields. Afterwards the frontend modules will decide, which fieldvalue to display.
Hi there,
I'm trying to change the content that will be committed based on the contentType. That means, I'm trying to submit the "description"-Field for PDF files, rather than the original content.
Until now I've tried with CopyTagger, both in preParseHandlers and postParseHandlers. Both won't work.
What would be the correct way to do this?
Thank you very much!