Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

How to commit few extra fields per documents other that title, language, dates and object type #800

Closed sudeshna-majumder closed 1 year ago

sudeshna-majumder commented 2 years ago

Hi Pascal,

I am using Google Cloud search Committer. And trying to index few extra fields apart from general ones(title, language, date, type ). Using postParse handler I am extracting those field values for each document but not able to send them to cloud committer or index them.

In the support document by google I can only see few fields can be sent to cloud-search-committer. https://developers.google.com/cloud-search/docs/guides/norconex-http-connector#configure-gcs

Can you suggest a way to commit these extra fields ?

essiembre commented 2 years ago

I would first confirm those fields are properly extracted by committing to files using the XMLFileCommitter or JSONFileCommitter. If your fields do not appear in generated files, please share your config to reproduce the issue.

If your fields are extracted properly, you can ask the Google Cloud Search Committer maintainers why they are not being sent out. You can raise your Google Cloud Committer issues here: https://github.com/google-cloudsearch/norconex-committer-plugin/issues

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.