apache / incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm
https://stormcrawler.apache.org/
Apache License 2.0
873 stars 257 forks source link

SOLR Status Updater - configure byDomain or byIP #626

Open jnioche opened 5 years ago

jnioche commented 5 years ago

only by host is currently implemented

jnioche commented 1 month ago

@mvolikas is that of interest to you?

mvolikas commented 1 month ago

I want to give it a try! I'm not 100% sure about the extra functionality we are aiming for though. Is this related to #620? I guess we could start by adding the partition key in the metadata like we do in OpenSearch?

jnioche commented 1 month ago

It is exactly that. What this is about it to have the field name to use for the key (and whether it should be in the metadata) configurable, just like in OpenSearch and also add the logic of host/domain/IP . The logic of how to query based on the shards will be added to the spouts later on in #620