Closed jetnet closed 4 years ago
I made some progress by moving that transformer to the fist place in the pre-processing chain. The following one was before and seemed to cause this issue:
<!-- Simple UTF-8 detector: checking, if the page contains UTF-8 encoded umlauts -->
<tagger class="$CountMatchesTagger">
<countMatches toField="uml_utf8_count" regex="true">[\u00E4\u00C4\u00F6\u00D6\u00FC\u00DC\u00DF]</countMatches>
</tagger>
Now I'm getting the expected results, but NOT everytime. Sometimes the following exception occurs:
Exception in thread "StreamConsumer-STDOUT" java.lang.NullPointerException
at com.norconex.commons.lang.io.CachedStreamFactory$MemoryTracker.hasEnoughAvailableMemory(CachedStreamFactory.java:151)
at com.norconex.commons.lang.io.CachedOutputStream.write(CachedOutputStream.java:145)
at java.io.OutputStream.write(OutputStream.java:75)
at com.norconex.importer.handler.transformer.impl.ExternalTransformer.writeLine(ExternalTransformer.java:706)
at com.norconex.importer.handler.transformer.impl.ExternalTransformer.access$000(ExternalTransformer.java:299)
at com.norconex.importer.handler.transformer.impl.ExternalTransformer$1.lineStreamed(ExternalTransformer.java:671)
at com.norconex.commons.lang.io.InputStreamLineListener.flushBuffer(InputStreamLineListener.java:117)
at com.norconex.commons.lang.io.InputStreamLineListener.streamed(InputStreamLineListener.java:93)
at com.norconex.commons.lang.io.InputStreamConsumer.fireStreamed(InputStreamConsumer.java:150)
at com.norconex.commons.lang.io.InputStreamConsumer.run(InputStreamConsumer.java:98)
could it be a race condition issue? when some processor or transformer have read the buffer, then it'd not be available to the next ones?
one more update - I guess I got it working finally. The solution was to save the $INPUT at the beginning of the script and provide it back as $OUTPUT.
So, looks like, the ExternalTransformer
consumes the input buffer and no content is available for further processors.
I am glad you have it working, but I am not sure what you mean by:
So, looks like, the ExternalTransformer consumes the input buffer and no content is available for further processors.
Can you share your config snippet illustrating what you did and/or explain further?
The current configuration is quite large. I'll try to reproduce the issue with the transformer config from above only.
hello Pascal,
I'd like to generate a thumbnail image for every incoming
document.contentFamily = image
using anExternalTransformer
script with ImageMagick tools. But it seems, the provided binary content viaSTDIN
or via${INPUT}
gets corrupted: I'm getting the same file size, but the binary content differs, e.g.: orig:from the transformer:
The transformer config looks like:
The
thumbmails.sh
's output is like (when testing):So, the question is - does the
ExternalTransformer
support binary content? Thanks!BTW: is there any better solution for thumbnail generation?