Closed aleha84 closed 7 years ago
I just tried your config snippet with your sample content and it worked fine for me. Which version are you using? You can try the latest snapshot in case something was fixed. Also, are you using any transformers before invoking the TextPatternTagger? Maybe the HTML is slightly modified beforehand?
Unless you are already facing performance issues, I suggest you use whatever approach you are more comfortable with. Unless your CPUs are maxed out, you can increase the number of threads and reduce the default delay. This should have a bigger performance impact than switching from DOMTagger to TextPatternTagger.
Importer used latest stable. No transformers before. There is no performance slowdown was detected, so i keep using DOMTagger. Thanx.
OK I'll close then but if similar issues come up again for you with TextPatternTagger, do not hesitate to re-open.
As suggested in DOMtagger description it is better (in performance purposes) to use TextPatternTagger.
Most of the pages have a breadcrumb which i need to store in metadata field. Page markup is like this:
In importer.preParseHandlers added two sections:
DOMTagger - works fine. TextPatternTagger - found nothing.
Is my confing wrong?