Closed SaschaHeyer closed 5 years ago
Hello Sacha,
I just tried with the latest snapshot release and it works fine, as you can tell by this attachment: fs-committer-files.zip
So I encourage you to try with the latest snapshot, or share your config in case there is something else going on I missed.
Hello Pascal,
thank you for your support, I can confirm the behavior is not related to Norconex itself. Rather the issue is related to a Committer Plugin which causes some dependency issue within the lib folder.
Best regards Sascha
Hello Pascal,
when crawling pages in some occasion the .cntnt files are empty and the meta tags are not getting extracted.
To reproduce the behavior please have a look the following files
https://storage.googleapis.com/sascha-issue-reproduction/emptyBody.html Contains an empty body, which obvious leads to an empty .cntnt file. But the existing meta tag is not extracted.
https://storage.googleapis.com/sascha-issue-reproduction/smallBody.html Contains a small amount of text in the body, but still the .cntnt file is empty and the meta tag is still not extracted.
https://storage.googleapis.com/sascha-issue-reproduction/largeBody.html After adding more content to the body the content and the meta tag is extracted properly.
It seems that the parsing / extracting is somehow related to the length of the content.
Any suggestions / known issues?
used version: 2.8.1
Best regards Sascha