crawler-commons / url-frontier

API definition, resources and reference implementation of URL Frontiers
Apache License 2.0
44 stars 11 forks source link

_ShardedRocksDBService_ #56

Closed jnioche closed 2 years ago

jnioche commented 2 years ago

DistributedFrontierService now forwards incoming URLs to a node based on a hash of its queue ID. Adds a new ShardedRocksDBService which extends DistributedFrontierService and allows to use RocksDB based Frontier instances as a cluster. The URLs sent get sharded to the right instance so it does not matter which node receives the data. Each node, however, only servers URLs for the data it holds, i.e. there is no communication with the other nodes when it comes to reading.

This does not affect the behaviour of the Ignite service, as the sharding is handled by Ignite itself.

This PR also fixes several bugs with _DistributedFrontierService_and RocksDBService.