Closed bkisselbach closed 4 years ago
<tagger class="com.norconex.importer.handler.tagger.impl.TextBetweenTagger" >
<restrictTo caseSensitive="false" field="title">.*</restrictTo>
<textBetween name="title">
<start>^</start>
<end>\|</end>
</textBetween>
</tagger>
I can think of a few possible causes.
The restrictTo
is meant to only apply a handler to certain documents, not fields. What you have right now is restricting the textBetween
logic to all documents matching .*
in their title (so all documents).
It should otherwise work. I suggest you put a DebugTagger just before yours to print out the title at that point, to confirm it is what you expect at that stage. It is possible for instance that you have more than one title value. This would show it. E.g.:
<tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger"
logFields="title" logLevel="INFO" />
You can add the same DebugTagger right after yours to see if ANY transformation occurred on that field.
You can also try the ReplaceTagger
instead, like this (untested):
<tagger class="com.norconex.importer.handler.tagger.impl.ReplaceTagger">
<replace regex="true" fromField="title">
<fromValue>^(.*?)\|.*</fromValue>
<toValue>$1</toValue>
</replace>
</tagger>
Perfect. I added a little to remove the whitespace. ^(.?)\s|.
Thanks!
We need to trim down the title that a webpage has but I can't get it to work. The title has pipes ( | ) in it and we want to only keep the words to the left of the first pipe. I've tried the textbetween tagger like this: