Closed cherlo closed 3 years ago
Not sure I grasp your question. sitemap.xml
files normally do not have title, keyword, and related fields. For every link it follows, it will index the target page with its metadata.
It will also add these fields to each documents from the sitemap:
collector.sitemap-lastmod
collector.sitemap-changefreq
collector.sitemap-priority
Does that answer?
I think so. I'm assuming that it wont parse SOE elements in sitemap.xml like:
If the info is in your sitemap.xml, one way to do it is to consider the sitemap a regular page and you extract its links using a custom ILinkExtractor
. Your custom solution will extract links which can have a title and description attached, which will be associated with the target URLs.
Thanks. We will look at extending the code.
If I implement a ILinkExtractor would I use Link.setTitle() and setText() to inject the info into the link? Or do these properties get overwritten by the collector?
The answer is... none of the above! :-)
They are added as new fields to target documents that are processed, They are:
collector.referrer-reference
collector.referrer-link-tag
collector.referrer-link-text
collector.referrer-link-title
If you want those to take over some other fields you have, I suggest you use RenameTagger
or CopyTagger
in your Importer section as a post-parse handler.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Does the collector add metadata (title, keywords, etc.) it find from the sitemap.xml itself or just metadata it finds inside the document itself?