Closed jnioche closed 2 years ago
This is due to
https://github.com/crawler-commons/url-frontier/blob/master/service/src/main/java/crawlercommons/urlfrontier/service/cluster/DistributedFrontierService.java#L141
Either the cache should have a list of all the incoming messages associated with a URL or - simpler option - block if a URL is about to be sent to an external Frontier but we already have something being processed for it.
also fixed a situation where the original stream from the client had been closed but the remote frontier had not had time to finish its work
This is due to
https://github.com/crawler-commons/url-frontier/blob/master/service/src/main/java/crawlercommons/urlfrontier/service/cluster/DistributedFrontierService.java#L141
Either the cache should have a list of all the incoming messages associated with a URL or - simpler option - block if a URL is about to be sent to an external Frontier but we already have something being processed for it.