Closed HailunTan closed 1 month ago
Have you specified a "Committer"? This is what's reponsible for sending all crawled data to a location of your choice. You may want to start with storing crawled content on the local filesystem as a test while you familiarize yourself. Like using the XMLFileCommitter. File a full list of available committers here: https://opensource.norconex.com/committers/
You'll find in the generated output files that it contains the three items you are seeking (URL, metadata, and text-only content).
The content type may be present in a few different forms depending on the metadata available, but one that should always be there (unless you filtered it out intentionally) is document.contentType
. You may also like document.contentFamily
which offers simple categorization of documents based on their type.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
As the title suggested, I checked the Norconext crawler. There are only three data fields from the crawled entries:
Could you please advise how I can get the content type of the crawled entry in Norconex crawler? Thanks.