commoncrawl / news-crawl

News crawling with StormCrawler - stores content as WARC
Apache License 2.0
316 stars 34 forks source link

news-crawl 2.x Broken when using multiple workers (across multiple hosts) #63

Closed jnioche closed 8 months ago

jnioche commented 9 months ago

Discussed in https://github.com/commoncrawl/news-crawl/discussions/62

Originally posted by **alextechnology** December 8, 2023 I spent today merging our local news-crawl codebase to the 2.x branch. We run only 1 nimbus and 2 supervisors. While the topology seems to work fine with a single worker, it does not work at all with two workers despite all the configuration being the same and working fine on news-crawl 1.2.4. The first issue seems to be related to local worker not being able to connect to the remote supervisor/worker. Once the topology is submitted to storm, the worker log will scroll for over a minute with 140+ attempts like the following: `2023-12-08 21:14:17.456 o.a.s.m.n.Client client-worker-1 [ERROR] connection attempt 147 to Netty-Client-xxxxxxxxxx failed: org.apache.storm.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxxxxxxxxxxxxxxx` It will eventually start just trying to do the crawl but there will be lots of the followings kinds of logs indicating failures: `2023-12-08 21:14:20.720 o.a.s.m.n.Client client-worker-1 [ERROR] failed to send 1 messages to Netty-Client-xxxxxxxxx java.nio.channels.ClosedChannelException ` AND ``` 2023-12-08 21:14:20.735 o.a.s.m.n.Client Worker-Transfer [INFO] Dropping 1 messages 2023-12-08 21:14:20.735 o.a.s.m.n.Client Worker-Transfer [INFO] Dropping 2 messages 2023-12-08 21:14:20.735 o.a.s.m.n.Client Worker-Transfer [INFO] Dropping 2 messages 2023-12-08 21:14:20.735 o.a.s.m.n.Client Worker-Transfer [INFO] Dropping 2 messages ``` The second issue (which I assume is related to the first) is this will inevitably lead to an exception with the java.util.ConcurrentModificationException error and crash the worker: ``` 2023-12-08 21:14:23.316 o.a.s.m.n.Client Worker-Transfer [INFO] Dropping 3 messages 2023-12-08 21:14:23.316 o.a.s.u.Utils Thread-14-feed-executor[70, 71] [ERROR] Async loop died! java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException Serialization trace: md (com.digitalpebble.stormcrawler.Metadata) at org.apache.storm.executor.Executor.accept(Executor.java:301) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.Utils$1.run(Utils.java:398) [storm-client-2.5.0.jar:2.5.0] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException Serialization trace: md (com.digitalpebble.stormcrawler.Metadata) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557) ~[kryo-4.0.2.jar:?] at org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:555) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.boltEmit(BoltOutputCollectorImpl.java:112) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.emit(BoltOutputCollectorImpl.java:65) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:93) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:42) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:77) ~[storm-client-2.5.0.jar:2.5.0] at com.digitalpebble.stormcrawler.bolt.FeedParserBolt.execute(FeedParserBolt.java:104) ~[stormjar.jar:?] at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:212) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.5.0.jar:2.5.0] ... 6 more Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1511) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1544) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1542) ~[?:?] at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:99) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:39) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557) ~[kryo-4.0.2.jar:?] at org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:555) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.boltEmit(BoltOutputCollectorImpl.java:112) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.emit(BoltOutputCollectorImpl.java:65) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:93) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:42) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:77) ~[storm-client-2.5.0.jar:2.5.0] at com.digitalpebble.stormcrawler.bolt.FeedParserBolt.execute(FeedParserBolt.java:104) ~[stormjar.jar:?] at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:212) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.5.0.jar:2.5.0] ... 6 more 2023-12-08 21:14:23.320 o.a.s.e.e.ReportError Thread-14-feed-executor[70, 71] [ERROR] Error java.lang.RuntimeException: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException Serialization trace: md (com.digitalpebble.stormcrawler.Metadata) at org.apache.storm.utils.Utils$1.run(Utils.java:413) ~[storm-client-2.5.0.jar:2.5.0] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException Serialization trace: md (com.digitalpebble.stormcrawler.Metadata) at org.apache.storm.executor.Executor.accept(Executor.java:301) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.5.0.jar:2.5.0] ... 1 more Caused by: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException Serialization trace: md (com.digitalpebble.stormcrawler.Metadata) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557) ~[kryo-4.0.2.jar:?] at org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:555) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.boltEmit(BoltOutputCollectorImpl.java:112) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.emit(BoltOutputCollectorImpl.java:65) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:93) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:42) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:77) ~[storm-client-2.5.0.jar:2.5.0] at com.digitalpebble.stormcrawler.bolt.FeedParserBolt.execute(FeedParserBolt.java:104) ~[stormjar.jar:?] at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:212) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.5.0.jar:2.5.0] ... 1 more Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1511) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1544) ~[?:?] at java.util.HashMap$EntryIterator.next(HashMap.java:1542) ~[?:?] at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:99) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:39) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-4.0.2.jar:?] at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557) ~[kryo-4.0.2.jar:?] at org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:555) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.boltEmit(BoltOutputCollectorImpl.java:112) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.emit(BoltOutputCollectorImpl.java:65) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:93) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:42) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:77) ~[storm-client-2.5.0.jar:2.5.0] at com.digitalpebble.stormcrawler.bolt.FeedParserBolt.execute(FeedParserBolt.java:104) ~[stormjar.jar:?] at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:212) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140) ~[storm-client-2.5.0.jar:2.5.0] at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.5.0.jar:2.5.0] ... 1 more ``` Thankfully, a single topology worker appears to run much faster in 2.x than 2 workers did in 1.2.4, so we're ok with that for the moment but would be nice to find out why this cannot run with 2 or more topology workers. EDIT: I forgot to mention that the remote worker/supervisor experiences similar errors, though it does receive the topology and tries to crawl the pages, but fails with dropped messages and during Worker-Transfer
jnioche commented 9 months ago

Hi @alextechnology

The 2 issues might not necessarily be related, so we have

  1. Connection problem between workers / nodes
  2. ConcurrentModificationException on the Metadata when emitting tuples from the FeedParserBolt

Focusing on the first one for now, how do you set the cluster up? Do you install Apache Storm manually on the nodes? Do you run Docker?

alextechnology commented 9 months ago

Hi @jnioche,

Very much appreciate your work on this and assistance.

I'm currently running two separate Ubuntu Pro 22.04 VMs on GCP - node Alpha and node Beta. Both have openjdk-8 and openjdk-11 installed (via aptitude) with the latter currently set as JAVA_HOME system (previously, openjdk-8 was set to run Storm 1.2.4 for news-crawl). Alpha also has elastic and kibana installed locally and configured to bind to the virtual private IP. I'm also running an independent 6 node Zookeeper cluster (in the same cloud and project). No docker involved at all.

I'm running Alpha as a storm nimbus seed (also bound to the private IP) and a supervisor node via supervisord with openjdk-11 set as the JAVA_HOME in the command string. I have Beta running only as a supervisor node and both instances using the same storm.yaml config. Both are running out of a storm user account created for this which has ownership of the storm binaries and the news-crawl code base. I haven't modified any of the news-crawl code (build completes successfully) but I have modified the es-conf and crawler-config to point to the correct resources for elastic. I have parallelism set to 32 everywhere and topology.workers set to 2. A few other minor config tunes as well but only the values intended to increase the performance a bit. The same config works fine for the current news-crawl master branch or for 2.x when I change topology.workers to 1. There is no firewall at all between Alpha and Beta. I've also added a /etc/hosts entry on Alpha for Beta's private IP to map Beta's system configured domain name - though not sure if this helps since both nodes' workers seem to fail to communicatie with each other.

Happy to share more details, so let me know anything that may help

To upgrade I manually merged 2.x into master locally and only had some minor conflicts to resolve in the news-crawl yaml configs.

jnioche commented 9 months ago

Just to check - Storm 2.5.0 is installed on these nodes, right?

alextechnology commented 9 months ago

Just to check - Storm 2.5.0 is installed on these nodes, right?

Correct, pre-built binary archive downloaded from archives.apache.com. Nimbus and supervisor both appear to startup and connect to zookeeper correctly

jnioche commented 9 months ago

The 2 issues might not necessarily be related, so we have

Actually the 2 issues might be related. If a worker crashes, obviously the other one can't connect to it...

The ConcurrentModificationException will happen only if there is serialization and Kryo serialization will happen only if a worker needs to send tuples to another worker

jnioche commented 9 months ago

The serialization issue seems to happen at line 104, which is surprising because once we get to that point nothing else should be accessing the Tuple. @alextechnology could you please run the topo with log level DEBUG and check whether this happens when a tuple is not a feed?

jnioche commented 9 months ago

one way of working out what modifies the objects as they are being serialised would be to use the lock method on the metadata object as it is being sent. One way to do this would be to copy the class FeedParserBolt, modify around line 104 to use the lock, modify the topology class or Flux file so that it uses your copy of the FeedParserBolt then restart the topology and check the logs. @alextechnology do you think you could give it a try?

alextechnology commented 9 months ago

Hi @jnioche - let me try both of these now and get back shortly. However, I did just try a few other configurations:

  1. Run only nimbus on Alpha and run only supervisor on Beta. This fails as the supervisor tries to load the seedlist from the local disk which doesn't exist as news-crawl isnt deployed on this node. I did place the seedlist in the same path and tried again, and it does seem to start up but ultimately fails because the WARC output path doesn't exist. I do see it begin fetching pages without an issue and pretty sure this will work if I just recreate the file path for the WARC

  2. Run nimbus and supervisor on Alpha but change topo workers to 2. This seems to run without an issue suggesting that the workers can communicate fine on the same host so there's some issue with workers communicating with the remote workers only.

Finally, before these tests, I did a storm kill CrawlTopo -w 30 to stop the running crawl and noticed that the shutdown doesn't seem so graceful - am attaching that log in case that's unexpected shutdown.txt

jnioche commented 9 months ago

Finally, before these tests, I did a storm kill CrawlTopo -w 30 to stop the running crawl and noticed that the shutdown doesn't seem so graceful - am attaching that log in case that's unexpected

Killing a topology is a messy business -;)

The Thread Dump seems to be a normal behaviour, see https://github.com/apache/storm/blob/9a238c448d7849b908b77d929b682e74926c01ac/storm-client/src/jvm/org/apache/storm/utils/Utils.java#L347

alextechnology commented 9 months ago

@jnioche So I'm a bit confused here - do all the resources (ie. seedlist and output folder) need to exist on all supervisor nodes? Because I just started the same problematic config (ie. Nimbus + Supervisor on Alpha, Supervisor on Beta) and:

  1. Same issue on Alpha worker at start up - 147 lines of not being able to connect to Beta worker before it seems to give up and start running the topo. Later I do see the same lines in the log suggesting it cannot connect to the Beta:
TIME(sampled): null TASK: 57 DELTA: -1
2023-12-11 16:31:59.754 o.a.s.e.b.BoltOutputCollectorImpl Thread-56-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[5, 5] [INFO] BOLT ack TASK: 5 TIME: -1 TUPLE: source: status:115, stream: __metrics, id: {}, [TASK_INFO: { host: dcrawler01.c.resonant-matter-144117.internal:6800 comp: status[115]}, [[waitAck = 0]
]] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:31:59.754 o.a.s.e.b.BoltExecutor Thread-27-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[43, 43] [INFO] Execute done TUPLE source: status:115, stream: __metrics, id: {}, [TASK_INFO: { host: dcrawler01.c.resonant-matter-144117.internal:6800 comp: status[115]}, [[waitAck = 0]]] PROC_START_TIME(sampled): null EXEC_START_
TIME(sampled): null TASK: 43 DELTA: -1
2023-12-11 16:31:59.754 o.a.s.e.b.BoltExecutor Thread-56-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[5, 5] [INFO] Execute done TUPLE source: status:115, stream: __metrics, id: {}, [TASK_INFO: { host: dcrawler01.c.resonant-matter-144117.internal:6800 comp: status[115]}, [[waitAck = 0]]] PROC_START_TIME(sample
d): null EXEC_START_TIME(sampled): null TASK: 5 DELTA: -1
2023-12-11 16:32:04.592 o.a.s.e.Executor Thread-46-fetch-executor[73, 73] [INFO] Processing received TUPLE: source: __system:-1, stream: __tick, id: {}, [5] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: -2 
2023-12-11 16:32:04.592 o.a.s.e.b.BoltExecutor Thread-46-fetch-executor[73, 73] [INFO] Execute done TUPLE source: __system:-1, stream: __tick, id: {}, [5] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 73 DELTA: -1
2023-12-11 16:32:04.864 c.d.s.e.p.AggregationSpout Thread-36-spout-executor[91, 91] [INFO] [spout #9]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.865 c.d.s.e.p.AggregationSpout Thread-35-spout-executor[95, 95] [INFO] [spout #13]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.865 c.d.s.e.p.AggregationSpout Thread-48-spout-executor[85, 85] [INFO] [spout #3]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.866 o.e.c.RestClient I/O dispatcher 41 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A9%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel case
 format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.866 o.e.c.RestClient I/O dispatcher 47 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A13%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel cas
e format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.867 o.e.c.RestClient I/O dispatcher 34 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A3%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel case
 format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.867 c.d.s.e.p.AggregationSpout I/O dispatcher 47 [INFO] [spout #13]  ES query returned 9 hits from 2 buckets in 2 msec with 9 already being processed. Took 0.22222222 msec per doc on average.
2023-12-11 16:32:04.867 c.d.s.e.p.AggregationSpout I/O dispatcher 41 [INFO] [spout #9]  ES query returned 15 hits from 6 buckets in 2 msec with 15 already being processed. Took 0.13333334 msec per doc on average.
2023-12-11 16:32:04.867 c.d.s.e.p.AggregationSpout I/O dispatcher 34 [INFO] [spout #3]  ES query returned 12 hits from 4 buckets in 2 msec with 12 already being processed. Took 0.16666667 msec per doc on average.
2023-12-11 16:32:04.876 c.d.s.e.p.AggregationSpout Thread-50-spout-executor[93, 93] [INFO] [spout #11]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.876 c.d.s.e.p.AggregationSpout Thread-52-spout-executor[97, 97] [INFO] [spout #15]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.876 c.d.s.e.p.AggregationSpout Thread-32-spout-executor[87, 87] [INFO] [spout #5]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.877 o.e.c.RestClient I/O dispatcher 34 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A11%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel cas
e format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.877 o.e.c.RestClient I/O dispatcher 41 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A5%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel case
 format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.877 o.e.c.RestClient I/O dispatcher 47 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A15%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel cas
e format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.877 c.d.s.e.p.AggregationSpout I/O dispatcher 34 [INFO] [spout #11]  ES query returned 8 hits from 1 buckets in 1 msec with 8 already being processed. Took 0.125 msec per doc on average.
2023-12-11 16:32:04.878 c.d.s.e.p.AggregationSpout I/O dispatcher 41 [INFO] [spout #5]  ES query returned 10 hits from 4 buckets in 2 msec with 10 already being processed. Took 0.2 msec per doc on average.
2023-12-11 16:32:04.878 c.d.s.e.p.AggregationSpout I/O dispatcher 47 [INFO] [spout #15]  ES query returned 33 hits from 5 buckets in 2 msec with 33 already being processed. Took 0.060606062 msec per doc on average.
2023-12-11 16:32:04.880 c.d.s.e.p.AggregationSpout Thread-31-spout-executor[83, 83] [INFO] [spout #1]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.881 o.e.c.RestClient I/O dispatcher 47 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A1%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel case
 format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.881 c.d.s.e.p.AggregationSpout I/O dispatcher 47 [INFO] [spout #1]  ES query returned 3 hits from 2 buckets in 1 msec with 3 already being processed. Took 0.33333334 msec per doc on average.
2023-12-11 16:32:04.891 c.d.s.e.p.AggregationSpout Thread-51-spout-executor[89, 89] [INFO] [spout #7]  Populating buffer with nextFetchDate <= 2023-12-11T16:30:34Z
2023-12-11 16:32:04.892 o.e.c.RestClient I/O dispatcher 47 [WARN] request [POST http://10.128.0.34:9200/status/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=query_then_fetch&batched_reduce_size=512&preference=_shards%3A7%7C_local] returned 1 warnings: [299 Elasticsearch-7.17.15-0b8ecfb4378335f4689c4223d1f1115f16bef3ba "Camel case
 format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."]
2023-12-11 16:32:04.893 c.d.s.e.p.AggregationSpout I/O dispatcher 47 [INFO] [spout #7]  ES query returned 24 hits from 7 buckets in 2 msec with 24 already being processed. Took 0.083333336 msec per doc on average.
2023-12-11 16:32:05.062 o.a.s.m.n.Client Netty-ChannelAlive-Timer [ERROR] connection to Netty-Client-beta.com/10.142.0.21:6800 is unavailable
2023-12-11 16:32:05.095 o.a.s.m.n.Client client-worker-1 [ERROR] connection attempt 215 to Netty-Client-beta.com/10.142.0.21:6800 failed: org.apache.storm.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: beta.com/10.142.0.21:6800
2023-12-11 16:32:05.695 o.a.s.m.n.Client client-worker-1 [ERROR] connection attempt 216 to Netty-Client-beta.com/10.142.0.21:6800 failed: org.apache.storm.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: beta.com/10.142.0.21:6800
2023-12-11 16:32:06.295 o.a.s.m.n.Client client-worker-1 [ERROR] connection attempt 217 to Netty-Client-beta.com/10.142.0.21:6800 failed: org.apache.storm.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: beta.com/10.142.0.21:6800
2023-12-11 16:32:06.895 o.a.s.m.n.Client client-worker-1 [ERROR] connection attempt 218 to Netty-Client-beta.com/10.142.0.21:6800 failed: org.apache.storm.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: beta.com/10.142.0.21:6800
2023-12-11 16:32:07.495 o.a.s.m.n.Client client-worker-1 [ERROR] connection attempt 219 to Netty-Client-beta.com/10.142.0.21:6800 failed: org.apache.storm.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: beta.com/10.142.0.21:6800
2023-12-11 16:32:08.050 o.a.s.d.w.WorkerState Netty-server-localhost-6800-worker-1 [INFO] Sending BackPressure status to new client. BPStatus: {worker=6bdbc725-3533-454e-a317-15dc0f8c6a5f, bpStatusId=20, bpTasks=[], nonBpTasks=[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65
, 67, 70, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 108, 109, 110, 111, 112, 113, 114, 115]}
2023-12-11 16:32:08.095 o.a.s.m.n.Client client-worker-1 [WARN] Re-connection to beta.com/10.142.0.21:6800 was successful but 993 messages has been lost so far
2023-12-11 16:32:09.242 o.a.s.e.Executor Thread-47-partitioner-executor[77, 77] [INFO] Processing received TUPLE: source: __system:-1, stream: __metrics_tick, id: {}, [10] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: -2 
2023-12-11 16:32:09.242 o.a.s.d.Task Thread-47-partitioner-executor[77, 77] [INFO] Emitting Tuple: taskId=77 componentId=partitioner stream=__metrics values=[TASK_INFO: { host: dcrawler01.c.resonant-matter-144117.internal:6800 comp: partitioner[77]}, [[URLPartitioner = {}]]]
2023-12-11 16:32:09.242 o.a.s.e.ExecutorTransfer Thread-47-partitioner-executor[77, 77] [INFO] TRANSFERRING tuple [dest: 7 tuple: source: partitioner:77, stream: __metrics, id: {}, [TASK_INFO: { host: dcrawler01.c.resonant-matter-144117.internal:6800 comp: partitioner[77]}, [[URLPartitioner = {}]]] PROC_START_TIME(sampled): null EXEC_START_TIME(sample
d): null]
2023-12-11 16:32:09.242 o.a.s.e.ExecutorTransfer Thread-47-partitioner-executor[77, 77] [INFO] TRANSFERRING tuple [dest: 57 tuple: source: partitioner:77, stream: __metrics, id: {}, [TASK_INFO: { host: dcrawler01.c.resonant-matter-144117.internal:6800 comp: partitioner[77]}, [[URLPartitioner = {}]]] PROC_START_TIME(sampled): null EXEC_START_TIME(sampl
ed): null]
2023-12-11 16:32:09.243 o.a.s.e.Executor Thread-44-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[57, 57] [INFO] Processing received TUPLE: source: partitioner:77, stream: __metrics, id: {}, [TASK_INFO: { host: dcrawler01.c.resonant-matter-144117.internal:6800 comp: partitioner[77]}, [[URLPartitioner = {}]]] PROC_START_TIME(sampled)
: null EXEC_START_TIME(sampled): null for TASK: 57 
  1. Beta seems even more odd - it takes forever to start up the worker, though clearly it does communicate fine with the Alpha nimbus. But I suppose it never connects with Alpha workers, so it immediately throws an error about the missing seedlist:
2023-12-11 16:28:20.107 o.a.s.e.Executor main [INFO] Loading executor tasks spout:[94, 94]
2023-12-11 16:28:20.107 o.a.s.e.b.BoltExecutor Thread-43-ssb-executor[98, 98] [INFO] Preparing bolt ssb:[98]
2023-12-11 16:28:20.107 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:83, stream: __ack_init, id: {}, [-4123990916148515094, -8114022221506181452, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.107 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:83, stream: __ack_init, id: {}, [-3378199865264390574, 8307993500274236253, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.108 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:83, stream: __ack_init, id: {}, [-3378199865264390574, 8307993500274236253, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.085 o.a.s.u.Utils Thread-15-filespout-executor[74, 74] [ERROR] Async loop died!
java.lang.RuntimeException: java.io.FileNotFoundException: /home/storm/systems/news-crawl/seeds/20231110.txt (No such file or directory)
    at com.digitalpebble.stormcrawler.spout.FileSpout.open(FileSpout.java:160) ~[stormjar.jar:?]
    at org.apache.storm.executor.spout.SpoutExecutor.init(SpoutExecutor.java:142) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.spout.SpoutExecutor.call(SpoutExecutor.java:152) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.spout.SpoutExecutor.call(SpoutExecutor.java:50) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.Utils$1.run(Utils.java:393) [storm-client-2.5.0.jar:2.5.0]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.io.FileNotFoundException: /home/storm/systems/news-crawl/seeds/20231110.txt (No such file or directory)
    at java.io.FileInputStream.open0(Native Method) ~[?:?]
    at java.io.FileInputStream.open(FileInputStream.java:219) ~[?:?]
    at java.io.FileInputStream.<init>(FileInputStream.java:157) ~[?:?]
    at com.digitalpebble.stormcrawler.spout.FileSpout.populateBuffer(FileSpout.java:133) ~[stormjar.jar:?]
    at com.digitalpebble.stormcrawler.spout.FileSpout.open(FileSpout.java:158) ~[stormjar.jar:?]
    ... 5 more
2023-12-11 16:28:20.108 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:83, stream: __ack_init, id: {}, [-3378199865264390574, 8307993500274236253, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.109 o.a.s.e.Executor main [INFO] Finished loading executor spout:[94, 94]
2023-12-11 16:28:20.109 o.a.s.e.Executor main [INFO] Loading executor tasks spout:[90, 90]
2023-12-11 16:28:20.109 o.a.s.e.s.SpoutExecutor Thread-44-spout-executor[94, 94] [INFO] Opening spout spout:[94]
2023-12-11 16:28:20.108 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-4123990916148515094, -6603443033961019697] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.110 o.a.s.e.e.ReportError Thread-15-filespout-executor[74, 74] [ERROR] Error
java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /home/storm/systems/news-crawl/seeds/20231110.txt (No such file or directory)
    at org.apache.storm.utils.Utils$1.run(Utils.java:413) ~[storm-client-2.5.0.jar:2.5.0]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /home/storm/systems/news-crawl/seeds/20231110.txt (No such file or directory)
    at com.digitalpebble.stormcrawler.spout.FileSpout.open(FileSpout.java:160) ~[stormjar.jar:?]
    at org.apache.storm.executor.spout.SpoutExecutor.init(SpoutExecutor.java:142) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.spout.SpoutExecutor.call(SpoutExecutor.java:152) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.spout.SpoutExecutor.call(SpoutExecutor.java:50) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.Utils$1.run(Utils.java:393) ~[storm-client-2.5.0.jar:2.5.0]
    ... 1 more
Caused by: java.io.FileNotFoundException: /home/storm/systems/news-crawl/seeds/20231110.txt (No such file or directory)
    at java.io.FileInputStream.open0(Native Method) ~[?:?]
    at java.io.FileInputStream.open(FileInputStream.java:219) ~[?:?]
    at java.io.FileInputStream.<init>(FileInputStream.java:157) ~[?:?]
    at com.digitalpebble.stormcrawler.spout.FileSpout.populateBuffer(FileSpout.java:133) ~[stormjar.jar:?]
    at com.digitalpebble.stormcrawler.spout.FileSpout.open(FileSpout.java:158) ~[stormjar.jar:?]
    at org.apache.storm.executor.spout.SpoutExecutor.init(SpoutExecutor.java:142) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.spout.SpoutExecutor.call(SpoutExecutor.java:152) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.spout.SpoutExecutor.call(SpoutExecutor.java:50) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.Utils$1.run(Utils.java:393) ~[storm-client-2.5.0.jar:2.5.0]
    ... 1 more
2023-12-11 16:28:20.110 o.a.s.e.Executor main [INFO] Finished loading executor spout:[90, 90]
2023-12-11 16:28:20.112 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[36, 36]
2023-12-11 16:28:20.112 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: 0 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-4123990916148515094, -6603443033961019697] PROC_START_TIME(sampled): 1702312100112 EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.112 o.a.s.e.s.SpoutExecutor Thread-45-spout-executor[90, 90] [INFO] Opening spout spout:[90]
2023-12-11 16:28:20.114 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[36, 36]
2023-12-11 16:28:20.114 o.a.s.e.Executor main [INFO] Loading executor tasks __system:[-1, -1]
2023-12-11 16:28:20.115 o.a.s.e.b.BoltExecutor Thread-46-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[36, 36] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[36]
2023-12-11 16:28:20.115 o.a.s.e.Executor main [INFO] Finished loading executor __system:[-1, -1]
2023-12-11 16:28:20.115 o.a.s.e.Executor main [INFO] Loading executor tasks partitioner:[76, 76]
2023-12-11 16:28:20.115 o.a.s.e.Executor main [INFO] Finished loading executor partitioner:[76, 76]
2023-12-11 16:28:20.116 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[60, 60]
2023-12-11 16:28:20.116 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[60, 60]

However, then Beta's seems to start processing tuples (is this because I've enabled sampling?):

2023-12-11 16:28:20.116 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[64, 64]
2023-12-11 16:28:20.117 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-4123990916148515094, -6603443033961019697] PROC_START_TIME(sampled): 1702312100112 EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.118 c.d.s.i.DummyIndexer Thread-43-ssb-executor[98, 98] [INFO] Mapping key parse.title to field title
2023-12-11 16:28:20.118 o.a.s.e.b.BoltExecutor Thread-50-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[60, 60] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[60]
2023-12-11 16:28:20.118 o.a.s.e.b.BoltExecutor Thread-48-partitioner-executor[76, 76] [INFO] Preparing bolt partitioner:[76]
2023-12-11 16:28:20.118 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-3378199865264390574, 6837611288718397660] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.118 c.d.s.i.DummyIndexer Thread-43-ssb-executor[98, 98] [INFO] Mapping key parse.keywords to field keywords
2023-12-11 16:28:20.118 c.d.s.b.URLPartitionerBolt Thread-48-partitioner-executor[76, 76] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.118 o.a.s.e.b.BoltExecutor Thread-46-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[36, 36] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[36]
2023-12-11 16:28:20.119 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[64, 64]
2023-12-11 16:28:20.118 c.d.s.i.DummyIndexer Thread-43-ssb-executor[98, 98] [INFO] Mapping key parse.description to field description
2023-12-11 16:28:20.120 o.a.s.e.Executor main [INFO] Loading executor tasks fetch:[72, 72]
2023-12-11 16:28:20.120 o.a.s.e.b.BoltExecutor Thread-43-ssb-executor[98, 98] [INFO] Prepared bolt ssb:[98]
2023-12-11 16:28:20.119 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-3378199865264390574, 6837611288718397660] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.121 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-3378199865264390574, 6837611288718397660] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.121 o.a.s.e.b.BoltExecutor Thread-48-partitioner-executor[76, 76] [INFO] Prepared bolt partitioner:[76]
2023-12-11 16:28:20.121 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:95, stream: __ack_init, id: {}, [-2068343561169177225, -531835215444606166, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.119 o.a.s.e.b.BoltExecutor Thread-50-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[60, 60] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[60]
2023-12-11 16:28:20.121 o.a.s.e.b.BoltExecutor Thread-53-fetch-executor[72, 72] [INFO] Preparing bolt fetch:[72]
2023-12-11 16:28:20.121 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:95, stream: __ack_init, id: {}, [-2068343561169177225, -531835215444606166, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.121 o.a.s.e.Executor main [INFO] Finished loading executor fetch:[72, 72]
2023-12-11 16:28:20.122 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[56, 56]
2023-12-11 16:28:20.122 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:95, stream: __ack_init, id: {}, [-2068343561169177225, -531835215444606166, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.122 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-4123990916148515094, -725436098569196388] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.121 o.a.s.e.b.BoltExecutor Thread-51-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[64, 64] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[64]
2023-12-11 16:28:20.123 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-4123990916148515094, -725436098569196388] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.123 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[56, 56]
2023-12-11 16:28:20.123 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-4123990916148515094, -725436098569196388] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.123 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[52, 52]
2023-12-11 16:28:20.123 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-3378199865264390574, -2606298094286109596] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.123 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-3378199865264390574, -2606298094286109596] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.123 o.a.s.e.b.BoltExecutor Thread-51-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[64, 64] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[64]
2023-12-11 16:28:20.124 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-3378199865264390574, -2606298094286109596] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.124 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:83, stream: __ack_init, id: {}, [-3478115639169539415, 7278560857325105651, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.124 o.a.s.e.b.BoltExecutor Thread-54-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[56, 56] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[56]
2023-12-11 16:28:20.124 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:83, stream: __ack_init, id: {}, [-3478115639169539415, 7278560857325105651, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.125 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[52, 52]
2023-12-11 16:28:20.125 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:83, stream: __ack_init, id: {}, [-3478115639169539415, 7278560857325105651, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.125 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[44, 44]
2023-12-11 16:28:20.125 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-2068343561169177225, 1145033480577199324] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.126 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-2068343561169177225, 1145033480577199324] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.126 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-2068343561169177225, 1145033480577199324] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.126 o.a.s.e.b.BoltExecutor Thread-54-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[56, 56] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[56]
2023-12-11 16:28:20.126 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[44, 44]
2023-12-11 16:28:20.126 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:95, stream: __ack_init, id: {}, [-1801313057165177831, -1465155875809605520, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.126 o.a.s.e.b.BoltExecutor Thread-56-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[52, 52] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[52]
2023-12-11 16:28:20.126 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[40, 40]
2023-12-11 16:28:20.126 o.a.s.e.b.BoltExecutor Thread-58-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[44, 44] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[44]
2023-12-11 16:28:20.126 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:95, stream: __ack_init, id: {}, [-1801313057165177831, -1465155875809605520, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.127 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[40, 40]
2023-12-11 16:28:20.127 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:95, stream: __ack_init, id: {}, [-1801313057165177831, -1465155875809605520, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.127 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[48, 48]
2023-12-11 16:28:20.129 o.a.s.e.b.BoltExecutor Thread-56-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[52, 52] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[52]
2023-12-11 16:28:20.130 o.a.s.e.b.BoltExecutor Thread-59-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[40, 40] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[40]
2023-12-11 16:28:20.130 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:83, stream: __ack_init, id: {}, [-148650565251030105, 776269881405876811, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.131 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:83, stream: __ack_init, id: {}, [-148650565251030105, 776269881405876811, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.131 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[48, 48]
2023-12-11 16:28:20.131 o.a.s.e.Executor main [INFO] Loading executor tasks spout:[84, 84]
2023-12-11 16:28:20.132 o.a.s.e.b.BoltExecutor Thread-58-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[44, 44] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[44]
2023-12-11 16:28:20.132 o.a.s.e.b.BoltExecutor Thread-59-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[40, 40] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[40]
2023-12-11 16:28:20.132 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:83, stream: __ack_init, id: {}, [-148650565251030105, 776269881405876811, 83] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.132 o.a.s.e.b.BoltExecutor Thread-62-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[48, 48] [INFO] Preparing bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[48]
2023-12-11 16:28:20.133 o.a.s.e.Executor main [INFO] Finished loading executor spout:[84, 84]
2023-12-11 16:28:20.133 o.a.s.e.Executor main [INFO] Loading executor tasks spout:[88, 88]
2023-12-11 16:28:20.133 o.a.s.e.Executor main [INFO] Finished loading executor spout:[88, 88]
2023-12-11 16:28:20.134 o.a.s.e.Executor main [INFO] Loading executor tasks sitemap:[80, 80]
2023-12-11 16:28:20.134 o.a.s.e.s.SpoutExecutor Thread-65-spout-executor[88, 88] [INFO] Opening spout spout:[88]
2023-12-11 16:28:20.134 o.a.s.e.b.BoltExecutor Thread-62-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[48, 48] [INFO] Prepared bolt __metrics_org.apache.storm.metric.LoggingMetricsConsumer:[48]
2023-12-11 16:28:20.132 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:95, stream: __ack_init, id: {}, [4970240361946410058, -1146044696927907618, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.134 o.a.s.e.Executor main [INFO] Finished loading executor sitemap:[80, 80]
2023-12-11 16:28:20.135 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[28, 28]
2023-12-11 16:28:20.135 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:95, stream: __ack_init, id: {}, [4970240361946410058, -1146044696927907618, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.135 o.a.s.e.b.BoltExecutor Thread-67-sitemap-executor[80, 80] [INFO] Preparing bolt sitemap:[80]
2023-12-11 16:28:20.135 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:95, stream: __ack_init, id: {}, [4970240361946410058, -1146044696927907618, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.135 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[28, 28]
2023-12-11 16:28:20.135 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:95, stream: __ack_init, id: {}, [-7813587939482510894, -283538427390482334, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.135 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[32, 32]
2023-12-11 16:28:20.135 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:95, stream: __ack_init, id: {}, [-7813587939482510894, -283538427390482334, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.135 o.a.s.e.b.BoltExecutor Thread-68-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[28, 28] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[28]
2023-12-11 16:28:20.135 o.a.s.e.s.SpoutExecutor Thread-64-spout-executor[84, 84] [INFO] Opening spout spout:[84]
2023-12-11 16:28:20.136 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[32, 32]
2023-12-11 16:28:20.136 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:95, stream: __ack_init, id: {}, [-7813587939482510894, -283538427390482334, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.137 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[12, 12]
2023-12-11 16:28:20.137 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:95, stream: __ack_init, id: {}, [2286059023404455111, 2600024955830255420, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.137 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:95, stream: __ack_init, id: {}, [2286059023404455111, 2600024955830255420, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.137 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[12, 12]
2023-12-11 16:28:20.137 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:95, stream: __ack_init, id: {}, [2286059023404455111, 2600024955830255420, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.138 o.a.s.e.b.BoltExecutor Thread-69-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[32, 32] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[32]
2023-12-11 16:28:20.138 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:95, stream: __ack_init, id: {}, [2946569250910592797, -1400580135842812437, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.138 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[4, 4]
2023-12-11 16:28:20.138 o.a.s.e.b.BoltExecutor Thread-70-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[12, 12] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[12]
2023-12-11 16:28:20.139 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:95, stream: __ack_init, id: {}, [2946569250910592797, -1400580135842812437, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.139 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[4, 4]
2023-12-11 16:28:20.140 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[8, 8]
2023-12-11 16:28:20.140 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:95, stream: __ack_init, id: {}, [2946569250910592797, -1400580135842812437, 95] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.140 o.a.s.e.b.BoltExecutor Thread-71-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[4, 4] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[4]
2023-12-11 16:28:20.140 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-2068343561169177225, 6573333448291401933] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.140 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-2068343561169177225, 6573333448291401933] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): 1702312100140
2023-12-11 16:28:20.141 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-2068343561169177225, 6573333448291401933] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): 1702312100140 TASK: 2 DELTA: 1
2023-12-11 16:28:20.141 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[8, 8]
2023-12-11 16:28:20.141 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[20, 20]
2023-12-11 16:28:20.142 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[20, 20]
2023-12-11 16:28:20.142 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[16, 16]
2023-12-11 16:28:20.142 o.a.s.e.b.BoltExecutor Thread-72-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[8, 8] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[8]
2023-12-11 16:28:20.143 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[16, 16]
2023-12-11 16:28:20.144 o.a.s.e.Executor main [INFO] Loading executor tasks __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[24, 24]
2023-12-11 16:28:20.145 o.a.s.e.b.BoltExecutor Thread-73-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[20, 20] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[20]
2023-12-11 16:28:20.145 o.a.s.e.b.BoltExecutor Thread-74-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[16, 16] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[16]
2023-12-11 16:28:20.145 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-3478115639169539415, -5709649264682537008] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.145 o.a.s.e.Executor main [INFO] Finished loading executor __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[24, 24]
2023-12-11 16:28:20.146 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-3478115639169539415, -5709649264682537008] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.146 o.a.s.e.Executor main [INFO] Loading executor tasks warc:[116, 116]
2023-12-11 16:28:20.147 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-3478115639169539415, -5709649264682537008] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.147 o.a.s.e.b.BoltExecutor Thread-75-__metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer-executor[24, 24] [INFO] Preparing bolt __metrics_com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer:[24]
2023-12-11 16:28:20.148 o.a.s.e.Executor main [INFO] Finished loading executor warc:[116, 116]
2023-12-11 16:28:20.147 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-1801313057165177831, -4745064018105203615] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.148 o.a.s.e.Executor main [INFO] Loading executor tasks spout:[96, 96]
2023-12-11 16:28:20.148 c.d.s.u.URLPartitioner Thread-28-spout-executor[82, 82] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.148 c.d.s.u.URLPartitioner Thread-30-spout-executor[86, 86] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.148 o.a.s.e.Executor main [INFO] Finished loading executor spout:[96, 96]
2023-12-11 16:28:20.148 o.a.s.e.Executor main [INFO] Loading executor tasks spout:[92, 92]
2023-12-11 16:28:20.148 c.d.s.u.URLPartitioner Thread-65-spout-executor[88, 88] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.148 c.d.s.u.URLPartitioner Thread-45-spout-executor[90, 90] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.149 c.d.s.u.URLPartitioner Thread-64-spout-executor[84, 84] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.149 o.a.s.e.Executor main [INFO] Finished loading executor spout:[92, 92]
2023-12-11 16:28:20.149 c.d.s.u.URLPartitioner Thread-44-spout-executor[94, 94] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.148 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-1801313057165177831, -4745064018105203615] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.149 o.a.s.e.Executor main [INFO] Loading executor tasks feed:[68, 69]
2023-12-11 16:28:20.150 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-1801313057165177831, -4745064018105203615] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.150 o.a.s.e.Executor main [INFO] Finished loading executor feed:[68, 69]
2023-12-11 16:28:20.151 o.a.s.e.Executor main [INFO] Loading executor tasks status:[100, 107]
2023-12-11 16:28:20.151 o.a.s.e.b.BoltExecutor Thread-76-warc-executor[116, 116] [INFO] Preparing bolt warc:[116]
2023-12-11 16:28:20.151 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-148650565251030105, -7054458867463269924] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.151 o.a.s.e.s.SpoutExecutor Thread-77-spout-executor[96, 96] [INFO] Opening spout spout:[96]
2023-12-11 16:28:20.151 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-148650565251030105, -7054458867463269924] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.152 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-148650565251030105, -7054458867463269924] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.152 c.d.s.u.URLPartitioner Thread-77-spout-executor[96, 96] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.152 o.a.s.e.b.BoltExecutor Thread-79-feed-executor[68, 69] [INFO] Preparing bolt feed:[68, 69]
2023-12-11 16:28:20.152 o.a.s.e.Executor main [INFO] Finished loading executor status:[100, 107]
2023-12-11 16:28:20.157 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [4970240361946410058, -1393566806601548956] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.157 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [4970240361946410058, -1393566806601548956] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.155 o.a.s.e.b.BoltExecutor Thread-80-status-executor[100, 107] [INFO] Preparing bolt status:[100, 101, 102, 103, 104, 105, 106, 107]
2023-12-11 16:28:20.157 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [4970240361946410058, -1393566806601548956] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.158 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-7813587939482510894, 7480049982378529504] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.158 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-7813587939482510894, 7480049982378529504] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.158 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-7813587939482510894, 7480049982378529504] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.158 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [2286059023404455111, 2066116417734427933] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.158 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [2286059023404455111, 2066116417734427933] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.158 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [2286059023404455111, 2066116417734427933] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.158 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [3800453218928820137, -118032834169845226, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.159 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [3800453218928820137, -118032834169845226, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.159 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [3800453218928820137, -118032834169845226, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.159 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [-2634672587229201707, -1726815147112707869, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.159 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [-2634672587229201707, -1726815147112707869, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.159 o.a.s.m.c.CGroupMetricsBase Thread-47-__system-executor[-1, -1] [WARN] CGroupCpuStat is disabled. cpu is not a mounted subsystem
2023-12-11 16:28:20.159 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [-2634672587229201707, -1726815147112707869, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.159 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [8946397892981490109, 1319004779640931907, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.160 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [8946397892981490109, 1319004779640931907, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.160 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [8946397892981490109, 1319004779640931907, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.160 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [2946569250910592797, -6897279057914939938] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.160 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [2946569250910592797, -6897279057914939938] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.160 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [2946569250910592797, -6897279057914939938] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.160 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:87, stream: __ack_init, id: {}, [4256344492579818105, -5624555335493571757, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.160 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:87, stream: __ack_init, id: {}, [4256344492579818105, -5624555335493571757, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.160 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:87, stream: __ack_init, id: {}, [4256344492579818105, -5624555335493571757, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.160 o.a.s.m.c.CGroupMetricsBase Thread-47-__system-executor[-1, -1] [WARN] CGroupMemoryLimit is disabled. memory is not a mounted subsystem
2023-12-11 16:28:20.161 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:97, stream: __ack_init, id: {}, [-7267420976913106448, -1663728895472928813, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.161 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:97, stream: __ack_init, id: {}, [-7267420976913106448, -1663728895472928813, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.161 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:97, stream: __ack_init, id: {}, [-7267420976913106448, -1663728895472928813, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.161 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [4572311561040840987, -6788911851055926022, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.161 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [4572311561040840987, -6788911851055926022, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.161 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [4572311561040840987, -6788911851055926022, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.161 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:97, stream: __ack_init, id: {}, [4981792382413578668, -3074453321169276904, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.161 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:97, stream: __ack_init, id: {}, [4981792382413578668, -3074453321169276904, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.161 o.a.s.m.c.CGroupMetricsBase Thread-47-__system-executor[-1, -1] [WARN] CGroupCpuGuaranteeByCfsQuota is disabled. cpu is not a mounted subsystem
2023-12-11 16:28:20.162 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:97, stream: __ack_init, id: {}, [4981792382413578668, -3074453321169276904, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.162 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:87, stream: __ack_init, id: {}, [-3490743804298062617, -7275838135438203156, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.162 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:87, stream: __ack_init, id: {}, [-3490743804298062617, -7275838135438203156, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.162 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:87, stream: __ack_init, id: {}, [-3490743804298062617, -7275838135438203156, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.162 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:93, stream: __ack_init, id: {}, [-4074006197233086992, -4335374168709552037, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.162 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:93, stream: __ack_init, id: {}, [-4074006197233086992, -4335374168709552037, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.162 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:93, stream: __ack_init, id: {}, [-4074006197233086992, -4335374168709552037, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.163 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:93, stream: __ack_init, id: {}, [-9061266249914929912, -5546759073310690614, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.163 o.a.s.m.c.CGroupMetricsBase Thread-47-__system-executor[-1, -1] [WARN] CGroupMemoryUsage is disabled. memory is not a mounted subsystem
2023-12-11 16:28:20.163 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:93, stream: __ack_init, id: {}, [-9061266249914929912, -5546759073310690614, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.163 o.a.s.e.s.SpoutExecutor Thread-78-spout-executor[92, 92] [INFO] Opening spout spout:[92]
2023-12-11 16:28:20.163 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:93, stream: __ack_init, id: {}, [-9061266249914929912, -5546759073310690614, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.164 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [5301100988092975999, 6835809773324968329, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.164 c.d.s.u.URLPartitioner Thread-78-spout-executor[92, 92] [INFO] Using partition mode : byDomain
2023-12-11 16:28:20.164 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [5301100988092975999, 6835809773324968329, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.165 o.a.s.m.c.CGroupMetricsBase Thread-47-__system-executor[-1, -1] [WARN] CGroupCpu is disabled. cpuacct is not a mounted subsystem
2023-12-11 16:28:20.165 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [5301100988092975999, 6835809773324968329, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.165 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [6760544673998728235, 4712299035249289647, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.165 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [6760544673998728235, 4712299035249289647, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.166 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [6760544673998728235, 4712299035249289647, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.166 o.a.s.m.c.CGroupMetricsBase Thread-47-__system-executor[-1, -1] [WARN] CGroupCpuGuarantee is disabled. cpu is not a mounted subsystem
2023-12-11 16:28:20.166 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [2209434805601426965, -2651182582903056162, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.166 o.a.s.e.b.BoltExecutor Thread-47-__system-executor[-1, -1] [INFO] Prepared bolt __system:[-1]
2023-12-11 16:28:20.166 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: 0 TUPLE: source: spout:89, stream: __ack_init, id: {}, [2209434805601426965, -2651182582903056162, 89] PROC_START_TIME(sampled): 1702312100166 EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.167 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [2209434805601426965, -2651182582903056162, 89] PROC_START_TIME(sampled): 1702312100166 EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.167 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [-3521069700462821725, -4121685141838493521, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.168 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [-3521069700462821725, -4121685141838493521, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.168 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [-3521069700462821725, -4121685141838493521, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.168 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:87, stream: __ack_init, id: {}, [-5099824388405106009, -7814831663344770237, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.169 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:87, stream: __ack_init, id: {}, [-5099824388405106009, -7814831663344770237, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.169 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:87, stream: __ack_init, id: {}, [-5099824388405106009, -7814831663344770237, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.169 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:91, stream: __ack_init, id: {}, [-6736403382555792190, -4970367296170810415, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.170 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:91, stream: __ack_init, id: {}, [-6736403382555792190, -4970367296170810415, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.170 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:91, stream: __ack_init, id: {}, [-6736403382555792190, -4970367296170810415, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.171 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:93, stream: __ack_init, id: {}, [3866775734294629399, -6768188872314748886, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.171 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:93, stream: __ack_init, id: {}, [3866775734294629399, -6768188872314748886, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): 1702312100171
2023-12-11 16:28:20.172 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:93, stream: __ack_init, id: {}, [3866775734294629399, -6768188872314748886, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): 1702312100171 TASK: 2 DELTA: 1
2023-12-11 16:28:20.172 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:91, stream: __ack_init, id: {}, [-4180928043309386618, -5043785088620001322, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.173 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:91, stream: __ack_init, id: {}, [-4180928043309386618, -5043785088620001322, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.173 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:91, stream: __ack_init, id: {}, [-4180928043309386618, -5043785088620001322, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.174 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:97, stream: __ack_init, id: {}, [-7886972260148926948, 4188480566748697533, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.174 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:97, stream: __ack_init, id: {}, [-7886972260148926948, 4188480566748697533, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.174 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:97, stream: __ack_init, id: {}, [-7886972260148926948, 4188480566748697533, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.177 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:93, stream: __ack_init, id: {}, [-4098573674829406881, 826968679567669069, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.177 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: 0 TUPLE: source: spout:93, stream: __ack_init, id: {}, [-4098573674829406881, 826968679567669069, 93] PROC_START_TIME(sampled): 1702312100177 EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.177 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:93, stream: __ack_init, id: {}, [-4098573674829406881, 826968679567669069, 93] PROC_START_TIME(sampled): 1702312100177 EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.178 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-3478115639169539415, -8730761081484476922] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.178 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-3478115639169539415, -8730761081484476922] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.178 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-3478115639169539415, -8730761081484476922] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.178 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:93, stream: __ack_init, id: {}, [6237080999574947757, -238454198009739507, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.178 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:93, stream: __ack_init, id: {}, [6237080999574947757, -238454198009739507, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.179 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:93, stream: __ack_init, id: {}, [6237080999574947757, -238454198009739507, 93] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.179 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:97, stream: __ack_init, id: {}, [6084453207936611306, -3914889148982770675, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.179 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:97, stream: __ack_init, id: {}, [6084453207936611306, -3914889148982770675, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.179 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:97, stream: __ack_init, id: {}, [6084453207936611306, -3914889148982770675, 97] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.179 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:87, stream: __ack_init, id: {}, [-8043919662105889343, 4431343109975724737, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.179 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:87, stream: __ack_init, id: {}, [-8043919662105889343, 4431343109975724737, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.180 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:87, stream: __ack_init, id: {}, [-8043919662105889343, 4431343109975724737, 87] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.180 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-7267420976913106448, 3481212360140942143] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.180 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-7267420976913106448, 3481212360140942143] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.180 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-7267420976913106448, 3481212360140942143] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.180 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:89, stream: __ack_init, id: {}, [3858824648568270060, -3559849241998194517, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.180 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:89, stream: __ack_init, id: {}, [3858824648568270060, -3559849241998194517, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.181 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: spout:89, stream: __ack_init, id: {}, [3858824648568270060, -3559849241998194517, 89] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.181 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: spout:91, stream: __ack_init, id: {}, [6935833859334561755, -8472375794900500271, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.181 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: spout:91, stream: __ack_init, id: {}, [6935833859334561755, -8472375794900500271, 91] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.181 c.d.s.u.URLPartitioner Thread-80-status-executor[100, 107] [INFO] Using partition mode : byDomain

But then after a bit of time it loads the topo config - isn't this out of order:

2023-12-11 16:28:20.244 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [6084453207936611306, 1594212693638942219] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.244 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [6084453207936611306, 1594212693638942219] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.244 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5236018837279089367, 4910516185876199059] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.244 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5236018837279089367, 4910516185876199059] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.245 o.a.s.d.w.Worker main [INFO] BackPressure status change checking will be performed every 50 millis
2023-12-11 16:28:20.245 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-5236018837279089367, 4910516185876199059] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.245 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [6935833859334561755, 3759631466051588138] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.245 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [6935833859334561755, 3759631466051588138] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.245 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [6935833859334561755, 3759631466051588138] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.245 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-8927812074023104766, 2878556287460316803] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.245 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-8927812074023104766, 2878556287460316803] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.245 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-8927812074023104766, 2878556287460316803] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.245 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [2556569793293798762, 7498757573609867582] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.245 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [2556569793293798762, 7498757573609867582] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.246 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [2556569793293798762, 7498757573609867582] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.246 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-8935012321312715528, -56508434523359153] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.246 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-8935012321312715528, -56508434523359153] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.246 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-8935012321312715528, -56508434523359153] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.246 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-5627070527711278671, 1549329322995875292] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.246 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-5627070527711278671, 1549329322995875292] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.247 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-5627070527711278671, 1549329322995875292] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.245 o.a.s.d.w.Worker main [INFO] Worker has topology config {topology.backpressure.wait.strategy=org.apache.storm.policy.WaitStrategyProgressive, sitemap.extensions=null, sitemap.schedule.delay=15, storm.zookeeper.auth.user=null, storm.messaging.netty.buffer_size=5242880, topology.name=NewsCrawl, scheduler.adaptive.fetchInterval.ma
x=129600, ui.filter=null, worker.profiler.enabled=false, executor.metrics.frequency.secs=60, fetchInterval.ERROR.isSitemapNews=true=86400, supervisor.queue.size=128, logviewer.cleanup.age.mins=10080, parsefilters.config.file=parsefilters.json, topology.tuple.serializer=org.apache.storm.serialization.types.ListDelegateSerializer, es.status.max.urls.per
.bucket=8, indexer.url.fieldname=url, fetchInterval.DISCOVERED.isSitemap=true=300, topology.transfer.buffer.size=1000, storm.oci.nscd.dir=/var/run/nscd, feed.sniffContent=true, nimbus.worker.heartbeats.recovery.strategy.class=org.apache.storm.nimbus.TimeOutWorkerHeartbeatsRecoveryStrategy, logviewer.port=8000, worker.childopts=-Xmx%HEAP-MEM%m -XX:+Hea
pDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump, storm.daemon.metrics.reporter.plugins=[org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter], logviewer.childopts=-Xmx128m, topology.ras.acker.executors.per.worker=1, storm.messaging.transport=org.apache.storm.messaging.netty.Context, storm.messaging.netty.authentication=false, fe
tcher.server.min.delay=0.0, storm.cgroup.memory.limit.tolerance.margin.mb=0.0, es.status.concurrentRequests=100, max.fetch.errors=3, storm.metricprocessor.class=org.apache.storm.metricstore.NimbusMetricProcessor, topology.kryo.factory=org.apache.storm.serialization.DefaultKryoFactory, fetchInterval.REDIRECTION.isSitemapNews=true=20160, es.status.globa
l.sort.field=nextFetchDate, http.robots.403.allow=true, indexer.text.fieldname=content, indexer.md.mapping=[parse.title=title, parse.keywords=keywords, parse.description=description], supervisor.slots.ports=[6800, 6801, 6802, 6803], storm.local.dir=storm-local, topology.backpressure.wait.park.microsec=100, fetchInterval.ERROR.isSitemapIndex=true=86400
, scheduler.class=com.digitalpebble.stormcrawler.persistence.AdaptiveScheduler, protocol.instances.num=1, fetcher.threads.per.queue=1, storm.principal.tolocal=org.apache.storm.security.auth.DefaultPrincipalToLocal, java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/lib64, worker.gc.childopts=, topology.multilang.serializer=org.apache.storm.
multilang.JsonSerializer, drpc.request.timeout.secs=600, topology.state.synchronization.timeout.secs=60, topology.bolt.wait.progressive.level2.count=1000, topology.worker.shared.thread.pool.size=4, topology.submitter.user=storm, pacemaker.servers=[], supervisor.monitor.frequency.secs=3, storm.auth.simple-white-list.users=[], topology.scheduler.strateg
y=org.apache.storm.scheduler.resource.strategies.scheduling.DefaultResourceAwareStrategy, storm.zookeeper.port=2181, storm.cluster.state.store=org.apache.storm.cluster.ZKStateStorageFactory, nimbus.thrift.threads=64, es.metrics.index.name=metrics, supervisor.supervisors=[], http.agent.version=118.0, storm.cluster.metrics.consumer.publish.interval.secs
=60, logviewer.filter.params=null, sitemap.discovery=true, protocols=http,https,file, storm.group.mapping.service=org.apache.storm.security.auth.ShellBasedGroupsMapping, topology.backpressure.wait.progressive.level2.count=1000, fetcher.max.queue.size=10, drpc.https.keystore.password=, resource.aware.scheduler.constraint.max.state.search=100000, superv
isor.heartbeat.frequency.secs=5, metadata.track.depth=true, storm.supervisor.medium.memory.grace.period.ms=30000, es.status.recentDate.increase=-1, ui.port=8080, fetchInterval.REDIRECTION.isSitemapIndex=true=20160, fetchInterval.DISCOVERED.isSitemapNews=true=0, warc.rotation.policy.max-minutes=1440, drpc.https.keystore.type=JKS, supervisor.memory.capa
city.mb=4096.0, storm.metricstore.class=org.apache.storm.metricstore.rocksdb.RocksDbStore, pacemaker.client.max.threads=2, okhttp.protocol.connection.pool={connection.keep.alive=300, max.idle.connections=5}, topology.shellbolt.max.pending=100, drpc.max_buffer_size=1048576, worker.profiler.childopts=-XX:+UnlockCommercialFeatures -XX:+FlightRecorder, fi
le.protocol.implementation=com.digitalpebble.stormcrawler.protocol.file.FileProtocol, nimbus.supervisor.timeout.secs=60, http.agent.url=, storm.zookeeper.root=/storm, topology.disable.loadaware.messaging=false, feed.filter.hours.since.published=720, topology.ras.one.executor.per.worker=false, parser.emitOutlinks.max.per.page=-1, nimbus.topology.valida
tor=org.apache.storm.nimbus.DefaultTopologyValidator, fetchInterval.fetch.error=2880, pacemaker.max.threads=50, fetcher.max.crawl.delay.force=false, ui.disable.http.binding=true, supervisor.blobstore.download.max_retries=3, topology.enable.message.timeouts=true, http.robots.headers.skip=false, es.status.addresses=http://10.128.0.34:9200, fetcher.max.t
hrottle.sleep=-1, urlfilters.config.file=urlfilters.json, logviewer.disable.http.binding=true, storm.messaging.netty.transfer.batch.size=262144, http.accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, storm.metricstore.rocksdb.retention_hours=240, supervisor.run.worker.as.user=false, fetcher.threads.number=100, topology.tasks=null
, indexer.text.maxlength=-1, fetchInterval.ERROR.isFeed=true=86400, logviewer.filter=null, pacemaker.kerberos.users=[], es.status.bucket.field=metadata.hostname, supervisor.worker.heartbeats.max.timeout.secs=600, supervisor.worker.timeout.secs=30, topology.worker.receiver.thread.count=1, es.status.routing.fieldname=metadata.hostname, scheduler.adaptiv
e.fetchInterval.min=60, robots.cache.spec=maximumSize=10000,expireAfterWrite=6h, nimbus.file.copy.expiration.secs=600, http.trust.everything=true, nimbus.topology.blobstore.deletion.delay.ms=300000, topology.workers=2, blacklist.scheduler.tolerance.time.secs=300, topology.environment=null, drpc.invocations.port=3773, storm.metricstore.rocksdb.create_i
f_missing=true, client.blobstore.class=org.apache.storm.blobstore.NimbusBlobStore, fetchInterval.FETCHED.isSitemap=true=259200, storm.nimbus.retry.intervalceiling.millis=60000, storm.messaging.netty.server_worker_threads=1, http.content.partial.as.trimmed=true, storm.thrift.socket.timeout.ms=600000, storm.topology.classpath.beginning.enabled=false, es
.status.index.name=status, topology.tick.tuple.freq.secs=null, nimbus.inbox.jar.expiration.secs=3600, storm.zookeeper.retry.interval=1000, storm.messaging.netty.buffer.high.watermark=16777216, storm.exhibitor.poll.uripath=/exhibitor/v1/cluster/list, storm.zookeeper.retry.times=5, nimbus.code.sync.freq.secs=120, topology.component.resources.offheap.mem
ory.mb=0.0, textextractor.skip.after=-1, partition.url.mode=byDomain, spout.ttl.purgatory=600, topology.state.checkpoint.interval.ms=1000, supervisor.localizer.cleanup.interval.ms=30000, storm.health.check.dir=healthchecks, supervisor.cpu.capacity=400.0, topology.backpressure.wait.progressive.level3.sleep.millis=1, fetcher.timeout.queue=-1, http.agent
.description=Firefox 118.0 Linux, status.updater.use.cache=true, storm.worker.min.cpu.pcore.percent=0.0, topology.spout.wait.progressive.level3.sleep.millis=1, track.anchors=true, topology.worker.childopts=-Xmx7g -Djava.net.preferIPv4Stack=true, drpc.https.port=-1, topology.bolt.wait.park.microsec=100, detect.mimetype=true, topology.acker.executors=nu
ll, topology.bolt.wait.progressive.level3.sleep.millis=1, supervisor.worker.start.timeout.secs=120, supervisor.worker.shutdown.sleep.secs=3, logviewer.max.per.worker.logs.size.mb=2048, task.heartbeat.frequency.secs=3, scheduler.adaptive.setLastModified=true, supervisor.thrift.max_buffer_size=1048576, es.status.max.buckets=900, fetchInterval.REDIRECTIO
N.isFeed=true=20160, supervisor.blobstore.class=org.apache.storm.blobstore.NimbusBlobStore, es.status.recentDate.min.gap=-1, topology.backpressure.enable=false, topology.producer.batch.size=1, drpc.worker.threads=64, http.allow.redirects=false, spout.min.delay.queries=15000, blacklist.scheduler.reporter=org.apache.storm.scheduler.blacklist.reporters.L
ogReporter, indexer.canonical.name=canonical, fetchInterval.FETCHED.isSitemapNews=true=720, drpc.queue.size=128, topology.eventlogger.executors=1, topology.spout.recvq.skips=3, storm.resource.isolation.plugin.enable=false, storm.supervisor.memory.limit.tolerance.margin.mb=128.0, storm.disable.symlinks=false, fetchInterval.default=5256000, transactiona
l.zookeeper.servers=null, https.protocol.implementation=com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol, logs.users=null, pacemaker.thrift.message.size.max=10485760, topology.ras.one.component.per.worker=false, es.status.bulkActions=1000, supervisor.thrift.port=6628, dev.zookeeper.path=/tmp/dev-storm-zookeeper, status.updater.cache.spec=ma
ximumSize=1000000,expireAfterAccess=4h, drpc.invocations.threads=64, storm.metricstore.rocksdb.metadata_string_cache_capacity=4000, storm.workers.artifacts.dir=workers-artifacts, storm.cluster.mode=distributed, ui.childopts=-Xmx768m, task.refresh.poll.secs=10, supervisor.childopts=-Djava.net.preferIPv4Stack=true, storm.health.check.timeout.ms=5000, wo
rker.profiler.command=flight.bash, storm.messaging.netty.min_wait_ms=100, http.content.limit=2148576, fetchInterval.FETCH_ERROR.isFeed=true=4320, topology.submitter.principal=, storm.resource.isolation.plugin=org.apache.storm.container.cgroup.CgroupManager, fetcher.server.delay.force=false, metadata.persist=[_redirTo, fetch.statusCode, error.cause, er
ror.source, isSitemap, isSitemapNews, isSitemapIndex, isSitemapVerified, isFeed, numLinks, last-modified, signature, signatureChangeDate, fetchInterval, protocol.etag], fetchInterval.FETCH_ERROR.isSitemap=true=259200, storm.exhibitor.port=8080, topology.bolt.wait.progressive.level1.count=1, pacemaker.auth.method=NONE, storm.oci.cgroup.root=/sys/fs/cgr
oup, storm.id=NewsCrawl-1-1702312011, supervisor.thrift.threads=16, ui.http.creds.plugin=org.apache.storm.security.auth.DefaultHttpCredentialsPlugin, supervisor.supervisors.commands=[], storm.cgroup.memory.enforcement.enable=false, drpc.port=3772, supervisor.localizer.update.blob.interval.secs=30, topology.max.spout.pending=120, fetchInterval.ERROR.is
Sitemap=true=259200, worker.metrics={CGroupCpuStat=org.apache.storm.metrics2.cgroup.CGroupCpuStat, CGroupMemoryLimit=org.apache.storm.metrics2.cgroup.CGroupMemoryLimit, CGroupCpuGuaranteeByCfsQuota=org.apache.storm.metrics2.cgroup.CGroupCpuGuaranteeByCfsQuota, CGroupMemory=org.apache.storm.metrics2.cgroup.CGroupMemoryUsage, CGroupCpu=org.apache.storm.
metrics2.cgroup.CGroupCpu, CGroupCpuGuarantee=org.apache.storm.metrics2.cgroup.CGroupCpuGuarantee}, fetchInterval.FETCHED.isFeed=true=360, es.status.bucket.sort.field=[nextFetchDate, url], es.status.sample=true, topology.component.cpu.pcore.percent=10.0, fetchInterval.FETCH_ERROR.isSitemapIndex=true=4320, worker.max.timeout.secs=600, blacklist.schedul
er.resume.time.secs=1800, drpc.childopts=-Xmx768m, nimbus.task.launch.secs=120, storm.supervisor.hard.memory.limit.overage.mb=2024, storm.zookeeper.servers=[10.142.0.16, 10.142.0.17, 10.142.0.18, 10.138.0.52, 10.168.0.11, 10.168.0.12], topology.localityaware.higher.bound=0.8, storm.cgroup.hierarchy.name=storm, blacklist.scheduler.assume.supervisor.bad
.based.on.bad.slot=true, protocol.md.prefix=protocol., warc={fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem}, nimbus.assignments.service.threads=10, es.status.routing=true, worker.heap.memory.mb=4096, http.agent.name=Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0, storm.network.topography.plugin=org.apache.storm.networkto
pography.DefaultRackDNSToSwitchMapping, warc.dir=/data/warc, topology.stats.sample.rate=0.05, topology.ras.constraint.max.state.search=10000, storm.oci.cgroup.parent=/storm, topology.testing.always.try.serialize=false, nimbus.assignments.service.thread.queue.size=100, sitemap.offset.guess=1024, robots.error.cache.spec=maximumSize=10000,expireAfterWrit
e=1h, nimbus.local.assignments.backend.class=org.apache.storm.assignments.InMemoryAssignmentBackend, http.robots.content.limit=-1, storm.group.mapping.service.cache.duration.secs=120, fetcher.max.urls.in.queues=9000, topology.kryo.register={com.digitalpebble.stormcrawler.persistence.Status=null, com.digitalpebble.stormcrawler.Metadata=null}, nimbus.bl
obstore.class=org.apache.storm.blobstore.LocalFsBlobStore, topology.kryo.decorators=[], topology.executor.receive.buffer.size=32768, storm.nimbus.retry.times=5, topology.transfer.batch.size=1, transactional.zookeeper.port=null, storm.zookeeper.retry.intervalceiling.millis=30000, nimbus.thrift.port=6627, blacklist.scheduler.tolerance.count=3, fetcher.m
ax.crawl.delay=40, nimbus.seeds=[10.128.0.34], topology.min.replication.count=1, nimbus.blobstore.expiration.secs=600, storm.nimbus.retry.interval.millis=2000, topology.max.task.parallelism=null, nimbus.credential.renewers.freq.secs=600, es.status.query.timeout=-1, detect.charset.maxlength=10000, storm.thrift.transport=org.apache.storm.security.auth.S
impleTransportPlugin, storm.cgroup.hierarchy.dir=/cgroup/storm_resources, storm.zookeeper.auth.password=null, drpc.authorizer.acl.strict=false, topology.message.timeout.secs=300, topology.error.throttle.interval.secs=10, topology.backpressure.check.millis=50, drpc.authorizer.acl.filename=drpc-auth-acl.yaml, topology.builtin.metrics.bucket.size.secs=60
, topology.spout.wait.park.microsec=100, fetchInterval.error=133920, storm.local.mode.zmq=false, http.protocol.implementation=com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol, parser.emitOutlinks=true, es.status.max.start.offset=500, http.store.headers=true, ui.header.buffer.bytes=4096, jsoup.treat.non.html.as.error=true, http.timeout=30000
, topology.serialized.message.size.metrics=false, topology.metrics.consumer.register=[{class=org.apache.storm.metric.LoggingMetricsConsumer, parallelism.hint=32}, {class=com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer, parallelism.hint=32}], drpc.disable.http.binding=true, storm.codedistributor.class=org.apache.storm.codedistribut
or.LocalFileSystemCodeDistributor, storm.supervisor.cgroup.rootdir=storm, topology.worker.max.heap.size.mb=768.0, indexer.ignore.empty.fields=false, storm.supervisor.hard.memory.limit.multiplier=2.0, worker.heartbeat.frequency.secs=1, storm.messaging.netty.max_wait_ms=1000, topology.backpressure.wait.progressive.level1.count=1, topology.max.error.repo
rt.per.interval=5, nimbus.thrift.max_buffer_size=1048576, fetchInterval.FETCHED.isSitemapVerified=true=720, storm.metricstore.rocksdb.location=storm_rocks, storm.supervisor.low.memory.threshold.mb=1024, ui.pagination=20, http.protocol.versions=null, topology.spout.wait.progressive.level2.count=0, blacklist.scheduler.strategy=org.apache.storm.scheduler
.blacklist.strategies.DefaultBlacklistStrategy, http.retry.on.connection.failure=true, storm.messaging.netty.client_worker_threads=1, supervisor.thrift.socket.timeout.ms=5000, storm.group.mapping.service.params=null, drpc.http.port=3774, scheduler.adaptive.fetchInterval.rate.incr=0.5, transactional.zookeeper.root=/transactional, supervisor.blobstore.d
ownload.thread.count=5, sitemap.sniffContent=true, topology.spout.wait.strategy=org.apache.storm.policy.WaitStrategyProgressive, storm.blobstore.inputstream.buffer.size.bytes=65536, fetcher.queue.mode=byHost, fetchInterval.FETCHED.isSitemapIndex=true=5040, logviewer.max.sum.worker.logs.size.mb=4096, topology.executor.overflow.limit=0, topology.batch.f
lush.interval.millis=1, selenium.tracing=false, pacemaker.port=6699, topology.worker.logwriter.childopts=-Xmx64m, fetcher.metrics.time.bucket.secs=30, drpc.http.creds.plugin=org.apache.storm.security.auth.DefaultHttpCredentialsPlugin, storm.blobstore.acl.validation.enabled=false, ui.filter.params=null, storm.supervisor.medium.memory.threshold.mb=1536,
 sitemap.filter.hours.since.modified=720, metadata.track.path=true, nimbus.cleanup.inbox.freq.secs=600, topology.fall.back.on.java.serialization=false, storm.nimbus.zookeeper.acls.fixup=true, logviewer.appender.name=A1, ui.users=null, pacemaker.childopts=-Xmx1024m, scheduler.display.resource=false, ui.actions.enabled=true, storm.zookeeper.connection.t
imeout=15000, robots.noFollow.strict=true, topology.debug=true, storm.blobstore.dependency.jar.upload.chunk.size.bytes=1048576, worker.log.level.reset.poll.secs=30, topology.spout.wait.progressive.level1.count=0, fetchInterval.DISCOVERED.isSitemapIndex=true=0, topology.priority=29, es.status.flushInterval=5s, warc.rotation.policy.max-mb=1023, schedule
r.adaptive.fetchInterval.rate.decr=0.2, storm.cgroup.resources=[cpu, memory], topology.classpath=null, storm.nimbus.zookeeper.acls.check=true, num.stat.buckets=20, supervisor.localizer.cache.target.size.mb=10240, spout.reset.fetchdate.after=120, topology.max.replication.wait.time.sec=60, storm.cgroup.cgexec.cmd=/bin/cgexec, topology.users=[null], sele
nium.timeouts={implicit=-1, pageLoad=-1, script=-1}, fetchInterval.DISCOVERED.isFeed=true=0, topology.trident.batch.emit.interval.millis=500, http.accept.language=en-us,en-gb,en;q=0.7,*;q=0.3, supervisor.enable=true, fetchInterval.FETCH_ERROR.isSitemapNews=true=4320, resource.aware.scheduler.priority.strategy=org.apache.storm.scheduler.resource.strate
gies.priority.DefaultSchedulingPriorityStrategy, storm.messaging.netty.socket.backlog=500, storm.cgroup.inherit.cpuset.configs=false, nimbus.queue.size=100000, ui.disable.spout.lag.monitoring=true, pacemaker.base.threads=10, nimbus.childopts=-Xmx1024m, http.robots.5xx.allow=false, storm.zookeeper.superACL=null, nimbus.monitor.freq.secs=10, fetcher.ser
ver.delay=3.0, status.updater.unit.round.date=SECOND, topology.localityaware.lower.bound=0.2, nimbus.task.timeout.secs=30, http.robots.meta.skip=false, urlbuffer.class=com.digitalpebble.stormcrawler.persistence.urlbuffer.SimpleURLBuffer, ui.host=0.0.0.0, topology.bolt.wait.strategy=org.apache.storm.policy.WaitStrategyProgressive, es.metrics.addresses=
http://10.128.0.34:9200, pacemaker.thread.timeout=10, fetchInterval.REDIRECTION.isSitemap=true=259200, storm.meta.serialization.delegate=org.apache.storm.serialization.GzipThriftSerializationDelegate, topology.skip.missing.kryo.registrations=false, storm.zookeeper.session.timeout=20000, http.robots.file.skip=false, topology.component.resources.onheap.
memory.mb=128.0, storm.log4j2.conf.dir=log4j2, task.credentials.poll.secs=30, http.agent.email=, storm.blobstore.replication.factor=3, storm.messaging.netty.buffer.low.watermark=8388608}
2023-12-11 16:28:20.247 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-5236018837279089367, 7955357069768614300] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.247 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-5236018837279089367, 7955357069768614300] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.248 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-5236018837279089367, 7955357069768614300] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.248 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-3461902836084744326, -8298087996738896118] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.248 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-3461902836084744326, -8298087996738896118] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.248 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-3461902836084744326, -8298087996738896118] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.248 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-8927812074023104766, -6381006344183577194] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.248 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-8927812074023104766, -6381006344183577194] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.248 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-8927812074023104766, -6381006344183577194] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.248 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-1214025998605477641, 8486828589341367712] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.248 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-1214025998605477641, 8486828589341367712] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.248 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-1214025998605477641, 8486828589341367712] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.249 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-2174812826611686353, 8933910478784277266] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.249 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-2174812826611686353, 8933910478784277266] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.249 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-2174812826611686353, 8933910478784277266] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.249 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [2556569793293798762, 1794687774208397323] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.249 o.a.s.d.w.Worker main [INFO] Worker b656245f-ae7f-4d41-bd0b-98ff985a51ed for storm NewsCrawl-1-1702312011 on 81ea410b-6200-4420-8b2d-452a189f65b7-10.142.0.21:6800  has finished loading
2023-12-11 16:28:20.249 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [2556569793293798762, 1794687774208397323] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.249 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [2556569793293798762, 1794687774208397323] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.249 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5412917930674871460, -2679041900049810362] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.249 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5412917930674871460, -2679041900049810362] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.250 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-5412917930674871460, -2679041900049810362] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.250 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-8935012321312715528, 1410811458580393384] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.250 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-8935012321312715528, 1410811458580393384] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.250 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-8935012321312715528, 1410811458580393384] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.250 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5809973328905888172, -8700995318426037421] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.250 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5809973328905888172, -8700995318426037421] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.250 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-5809973328905888172, -8700995318426037421] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.250 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5160443130954077648, -7999364024869339203] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.250 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [-5160443130954077648, -7999364024869339203] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.250 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [-5160443130954077648, -7999364024869339203] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.251 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-3461902836084744326, -3805243869296651998] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.251 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-3461902836084744326, -3805243869296651998] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.251 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-3461902836084744326, -3805243869296651998] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.251 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-1214025998605477641, 7938128007580305540] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.251 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-1214025998605477641, 7938128007580305540] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.251 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-1214025998605477641, 7938128007580305540] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.251 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [7741585547196580345, 5967571995592991844] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.251 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: prefilter:79, stream: __ack_ack, id: {}, [7741585547196580345, 5967571995592991844] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.251 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: prefilter:79, stream: __ack_ack, id: {}, [7741585547196580345, 5967571995592991844] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.251 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-2174812826611686353, -4092819621652857975] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:20.252 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-2174812826611686353, -4092819621652857975] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:20.252 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: partitioner:77, stream: __ack_ack, id: {}, [-2174812826611686353, -4092819621652857975] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:20.252 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: partitioner:77, stream: __ack_ack, id: {}, [-5412917930674871460, 5353909444598418875] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 

Finally, I do see Beta's worker start to emit some errors - logging related? Datagram too big?

protocol.content-type: text/html; charset=UTF-8
] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null]
2023-12-11 16:28:21.470 STDERR Thread-1 [INFO]  at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89)
2023-12-11 16:28:21.470 STDERR Thread-1 [INFO]  at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154)
2023-12-11 16:28:21.470 STDERR Thread-1 [INFO]  at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140)
2023-12-11 16:28:21.471 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -1602384185032665566] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:21.471 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -1602384185032665566] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:21.470 STDERR Thread-1 [INFO]  at org.apache.storm.utils.Utils$1.run(Utils.java:398)
2023-12-11 16:28:21.471 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -7745260568013248921] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:21.471 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -7745260568013248921] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:21.471 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -7745260568013248921] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:21.471 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: status:108, stream: __ack_ack, id: {}, [2648580543762087270, 6163409022820091796] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:21.471 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: 0 TUPLE: source: status:108, stream: __ack_ack, id: {}, [2648580543762087270, 6163409022820091796] PROC_START_TIME(sampled): 1702312101471 EXEC_START_TIME(sampled): null
2023-12-11 16:28:21.471 STDERR Thread-1 [INFO]  at java.base/java.lang.Thread.run(Thread.java:829)
2023-12-11 16:28:21.471 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: status:108, stream: __ack_ack, id: {}, [2648580543762087270, 6163409022820091796] PROC_START_TIME(sampled): 1702312101471 EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:21.471 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, 8454087514544735688] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:21.471 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, 8454087514544735688] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:21.471 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, 8454087514544735688] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:21.471 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, 7899198290451746760] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:21.471 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, 7899198290451746760] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:21.471 STDERR Thread-1 [INFO] Caused by: java.io.IOException: Message too long
2023-12-11 16:28:21.472 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, 7899198290451746760] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:21.472 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -6200665771136418110] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:21.472 o.a.s.d.Task FetcherThread #10 [INFO] Emitting Tuple: taskId=72 componentId=fetch stream=status values=[https://www.clickondetroit.com/arcio/sitemap/, url.path: https://www.clickondetroit.com/sitemap.xml
url.path: https://www.clickondetroit.com/arcio/news-sitemap/
depth: 2
isSitemap: true
, DISCOVERED]
2023-12-11 16:28:21.472 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -6200665771136418110] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:21.472 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -6200665771136418110] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
2023-12-11 16:28:21.472 o.a.s.e.ExecutorTransfer FetcherThread #10 [INFO] TRANSFERRING tuple [dest: 104 tuple: source: fetch:72, stream: status, id: {5493392416672285611=-3066904289869453196}, [https://www.clickondetroit.com/arcio/sitemap/, url.path: https://www.clickondetroit.com/sitemap.xml
url.path: https://www.clickondetroit.com/arcio/news-sitemap/
depth: 2
isSitemap: true
, DISCOVERED] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null]
2023-12-11 16:28:21.472 STDERR Thread-1 [INFO]  at java.base/java.net.PlainDatagramSocketImpl.send0(Native Method)
2023-12-11 16:28:21.472 o.a.s.e.Executor Thread-41-__acker-executor[2, 2] [INFO] Processing received TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -670262096599667379] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null for TASK: 2 
2023-12-11 16:28:21.472 o.a.s.e.b.BoltOutputCollectorImpl Thread-41-__acker-executor[2, 2] [INFO] BOLT ack TASK: 2 TIME: -1 TUPLE: source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -670262096599667379] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null
2023-12-11 16:28:21.472 o.a.s.e.b.BoltExecutor Thread-41-__acker-executor[2, 2] [INFO] Execute done TUPLE source: status:108, stream: __ack_ack, id: {}, [6935833859334561755, -670262096599667379] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null TASK: 2 DELTA: -1
jnioche commented 9 months ago

Quick answer - about to log off

if the topology is configured to do the injection, then yes, you need the seed files to be accessible for the workers. that's why you are seeing

java.lang.RuntimeException: java.io.FileNotFoundException: /home/storm/systems/news-crawl/seeds/20231110.txt (No such file or directory) if Elastic already contains the seeds then, avoid the injection by omitting the arguments on the command line. Typically, with StormCrawler the injections tend to be done as a local job, i.e. it doesn't involve the workers and is not distributed.

As for the output folders, you will definitely need them.

Maybe set the log level to DEBUG only for the FeedParserBolt. It gets very verbose otherwise.

alextechnology commented 9 months ago

Quick answer - about to log off

if the topology is configured to do the injection, then yes, you need the seed files to be accessible for the workers. that's why you are seeing

java.lang.RuntimeException: java.io.FileNotFoundException: /home/storm/systems/news-crawl/seeds/20231110.txt (No such file or directory) if Elastic already contains the seeds then, avoid the injection by omitting the arguments on the command line. Typically, with StormCrawler the injections tend to be done as a local job, i.e. it doesn't involve the workers and is not distributed.

As for the output folders, you will definitely need them.

Maybe set the log level to DEBUG only for the FeedParserBolt. It gets very verbose otherwise.

@jnioche Many thanks for answering my questions. Seeds are already in Elastic so I've removed those parameters and no longer see that error. As for the output folders, I assumed the "lead/master" supervisor (ie. the first to start) acted like a sink, writing all the data crawled from all the workers but your comment clearly says otherwise so I've simply created an NFS share out of the data output folder of Alpha which is mounted on Beta at the same absolute path, so that error is gone too - hooray! Though I'm still not sure how they two supervisors coordinate writing to the same WARC file concurrently - or perhaps they write to independent files to avoid this but I don't see that happening and not sure if that's simply due to connectivity issues between the workers.

Unfortunately, neither of the above changes appears to have fixed the main issue which is that the cluster workers can't seem to consistently communicate with each other when they are on different machines. From what I can tell, this seems to lead to a whole number of issues including the java.util.ConcurrentModificationException errors among others. This error appears to arise in the remote worker relative to the nimbus (ie. Beta). I'm fairly certain that part if not all of this is actually in Storm itself and related to Netty because there's no issues with a single worker nor multiple workers running on a single host - both configurations where Netty is not involved. It seems odd that the Alpha worker starts up and spends over a minute failing to connect to the Beta worker which actually hasn't event started yet at that point! After ~140 attempts, the Alpha worker finally starts fetching and only a bit later does the Beta worker start up - yet doesn't display the same behavior of dozens of attempts to connect to the Alpha worker. Eventually, I suppose the two are not communicating correctly and there's a concurrent modification done of metadata which causes the Beta worker to just die and restart, a loop that continues indefinitely.

I'm not sure how else to try and debug this, especially if the issue is in storm itself. Curious why you chose storm 2.5.0 instead of the latest 2.6.0? Any chance the later version is compatible with this news-crawl branch? Perhaps I should try that version instead? Or maybe I should try building storm from source instead of using the pre-built binary?

I really appreciate the work you've done and the assistance and insight you've provided so far. Worse comes to worst, my use case can probably still be fulfilled with a single worker but really limits the scale to a degree that I may need to find another crawling solution entirely.

jnioche commented 9 months ago

perhaps they write to independent files to avoid this but I don't see that happening and not sure if that's simply due to connectivity issues between the workers.

See WARCNameFormat - the file name contains the task index so there can't be a collision

I'm not sure how else to try and debug this, especially if the issue is in storm itself. Curious why you chose storm 2.5.0 instead of the latest 2.6.0?

Simply because 2.6.0 has been releasing since and currently doesn't have a Docker image. I initiated a vote yesterday so that the Apache Storm project releases its own Docker images and we can control when this happens.

The code in this repo uses Docker to simplify adoption but since you install Storm directly, you could try 2.6.0 You would probably need to use StormCrawler 2.11-SNAPSHOT to ensure compatibility. I am planning to release SC 2.11 soon, not sure when though.

I am not sure the ConcurrentException is due to the connection problem, or whether it is the other round. One thing to do would be to try the lock() approach I described yesterday and see where modifies the Metadata object.

You could also try a vanilla WordCount topology on your cluster and check whether it works. Maybe double check the Storm configuration and version of Zookeeper?

BTW it is always interesting to hear about how people use our code or StormCrawler in general. If at all possible, feel free to share your use case with the community.

Thanks!

alextechnology commented 9 months ago

@jnioche I went ahead and made a new FeedParserBolt with a metadata.lock() and metadata.unlock() before and after the collector.emit() + collector.ack() around line 104 and below is the resulting error stack I'm getting (also added a few logging messages to get a better idea). It seems to indicate the StatusUpdaterBolt for elastic is somehow involved? I tried disabling the status.updater.use.cache and reducing the es.status.concurrentRequests but neither change made a difference. I also tried the following setup:

And this works just fine as well - so there's definitely no issue with the remote supervisor being able to communicate with nimbus, run a supervisor + workers, interface with elastic, or write to the WARC data folder configured. Ironically, it's actually a bit faster as it's a more powerful VM.

Any ideas of what could be the issue or anything else I could try?

2023-12-12 20:02:19.299 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Processing https://qz.com/uk-lawmakers-vote-in-favor-of-the-rwanda-migration-bill-1851093410
2023-12-12 20:02:19.299 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Not a feed https://qz.com/uk-lawmakers-vote-in-favor-of-the-rwanda-migration-bill-1851093410
2023-12-12 20:02:19.301 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] Sitemap (found 1 links, 1 skipped): https://www.cityam.com/sitemap.xml?yyyy=2018&mm=02&dd=17
2023-12-12 20:02:19.301 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] https://www.macworld.com/news-sitemap.xml detected as news sitemap based on content
2023-12-12 20:02:19.301 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Processing https://spaceexplored.com/2023/12/12/blue-origin-plans-to-return-to-new-shepard-flights-next-week/
2023-12-12 20:02:19.302 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Not a feed https://spaceexplored.com/2023/12/12/blue-origin-plans-to-return-to-new-shepard-flights-next-week/
2023-12-12 20:02:19.309 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] Sitemap (found 32 links, 0 skipped): https://www.macworld.com/news-sitemap.xml
2023-12-12 20:02:19.310 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Processing https://www.foxbusiness.com/economy/november-inflation-breakdown-where-are-prices-still-rising-fastest
2023-12-12 20:02:19.302 o.a.s.u.Utils Thread-77-status-executor[105, 112] [ERROR] Async loop died!
java.lang.RuntimeException: java.util.ConcurrentModificationException: Attempt to modify a metadata after it has been sent to the serializer
    at org.apache.storm.executor.Executor.accept(Executor.java:301) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.Utils$1.run(Utils.java:398) [storm-client-2.5.0.jar:2.5.0]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.util.ConcurrentModificationException: Attempt to modify a metadata after it has been sent to the serializer
    at com.digitalpebble.stormcrawler.Metadata.checkLockException(Metadata.java:256) ~[stormjar.jar:?]
    at com.digitalpebble.stormcrawler.Metadata.setValue(Metadata.java:130) ~[stormjar.jar:?]
    at com.digitalpebble.stormcrawler.persistence.AbstractStatusUpdaterBolt.execute(AbstractStatusUpdaterBolt.java:183) ~[stormjar.jar:?]
    at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:212) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.5.0.jar:2.5.0]
    ... 6 more
2023-12-12 20:02:19.310 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Not a feed https://www.foxbusiness.com/economy/november-inflation-breakdown-where-are-prices-still-rising-fastest
2023-12-12 20:02:19.311 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Processing https://www.thedailybeast.com/fbi-arrest-internet-pornography-personality-paul-caloia-for-jan-6-capitol-riot
2023-12-12 20:02:19.310 o.a.s.e.e.ReportError Thread-77-status-executor[105, 112] [ERROR] Error
java.lang.RuntimeException: java.lang.RuntimeException: java.util.ConcurrentModificationException: Attempt to modify a metadata after it has been sent to the serializer
    at org.apache.storm.utils.Utils$1.run(Utils.java:413) ~[storm-client-2.5.0.jar:2.5.0]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.RuntimeException: java.util.ConcurrentModificationException: Attempt to modify a metadata after it has been sent to the serializer
    at org.apache.storm.executor.Executor.accept(Executor.java:301) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.5.0.jar:2.5.0]
    ... 1 more
Caused by: java.util.ConcurrentModificationException: Attempt to modify a metadata after it has been sent to the serializer
    at com.digitalpebble.stormcrawler.Metadata.checkLockException(Metadata.java:256) ~[stormjar.jar:?]
    at com.digitalpebble.stormcrawler.Metadata.setValue(Metadata.java:130) ~[stormjar.jar:?]
    at com.digitalpebble.stormcrawler.persistence.AbstractStatusUpdaterBolt.execute(AbstractStatusUpdaterBolt.java:183) ~[stormjar.jar:?]
    at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:212) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:154) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:140) ~[storm-client-2.5.0.jar:2.5.0]
    at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.5.0.jar:2.5.0]
    ... 1 more
2023-12-12 20:02:19.311 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] https://www.twincities.com/sitemap.xml?yyyy=2011&mm=08&dd=05 detected as news sitemap based on content
2023-12-12 20:02:19.311 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Not a feed https://www.thedailybeast.com/fbi-arrest-internet-pornography-personality-paul-caloia-for-jan-6-capitol-riot
2023-12-12 20:02:19.312 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Processing https://www.pressherald.com/life/arts-entertainment/feed/
2023-12-12 20:02:19.312 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] Found a feed https://www.pressherald.com/life/arts-entertainment/feed/
2023-12-12 20:02:19.312 c.d.s.b.FeedParserBolt Thread-75-feed-executor[69, 70] [INFO] parseFeed for https://www.pressherald.com/life/arts-entertainment/feed/
2023-12-12 20:02:19.328 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] Sitemap (found 159 links, 159 skipped): https://www.twincities.com/sitemap.xml?yyyy=2011&mm=08&dd=05
2023-12-12 20:02:19.329 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] https://www.ocregister.com/sitemap.xml?yyyy=2011&mm=08&dd=01 detected as news sitemap based on content
2023-12-12 20:02:19.346 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] Sitemap (found 196 links, 196 skipped): https://www.ocregister.com/sitemap.xml?yyyy=2011&mm=08&dd=01
2023-12-12 20:02:19.347 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] https://www.nbcsportsphiladelphia.com/sitemap.xml?yyyy=2019&mm=11 detected as news sitemap based on content
2023-12-12 20:02:19.367 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] Sitemap (found 462 links, 447 skipped): https://www.nbcsportsphiladelphia.com/sitemap.xml?yyyy=2019&mm=11
2023-12-12 20:02:19.370 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] https://www.news.com.au/sitemap.xml?yyyy=2018&mm=05&dd=06 detected as news sitemap based on content
2023-12-12 20:02:19.376 o.c.s.n.NewsSiteMapParserBolt Thread-36-sitemap-executor[78, 78] [INFO] Sitemap (found 129 links, 129 skipped): https://www.news.com.au/sitemap.xml?yyyy=2018&mm=05&dd=06
jnioche commented 8 months ago

Thanks @alextechnology, this is very useful. I think this is what is happening.

If you look at the topology class (assuming that this is what you are using, the Flux setup should be the same anyway), you'll see that the output of the FeedParser goes into 3 places:

  1. the dummy indexer on the default stream - its role is just to change the value of the status and pass it to the status updater bolt. This is done only if a page is not a feed i.e it is an actual article.
  2. the status updater on the status stream - writes to Elastic. This is if a page is a feed and is used for adding discovered links to the status index. Also used to handle the status in case of an error.
  3. the WARC bolt on the default stream - again only if a page is not a feed

So the problem happens when a page is an article (and not a feed). The metadata is sent simultaneously to both the DummyIndexer and the WARCBolt. The DummyIndexer does not do much and sends the content to the StatusUpdaterBolt, which adds a key value to the Metadata. The problem is that the WARC bolt is probably on the other worker and therefore the tuple needs serializing with Kryo, we are getting the exception because sooner or later there the StatusUpdaterBolt will be changing the metadata while the serialization is happening.

Now one interesting thing is that if you look at the Storm UI when the topology is running you should see that the spouts have as many instances as there are shards in the status index. The number of instances for all the bolts is equal to the number of workers - apart for the WARC bolt which should have a single instance. This means we are bound to need serialization for 50% of the tuples as they have no chance of running locally.

What we should do is set the number of instances of the WARC bolt to be the same as the number of workers, this way the tuple would remain data local as much as possible and we would avoid serializing and sending the data across to another node, especially given that in this particular instance what we send is the entire binary content of a page, which can be pretty heavy. I will open an issue to that effect.

This is a partial solution - we still need a way of copying or cloning the metadata when the topology sends the same data to two different outputs, I need to think about the right solution to do this. However, having an instance of the WARC bolt locally should alleviate the problem.

Looking a the code, I can also see that the topology is configured incorrectly for the status updater bolt. They should receive the tuples on the status stream hashed on the value of the URL field - like done in the archetype for Elastic. This is also incorrect in the flux file. This went unnoticed because the number of workers is set to 1 by default. It means however that with more than 1 worker, the caching in the status bolts would be missed, resulting in more URLs getting sent to Elastic than necessary.

I will open another issue to fix this.

Your reporting of this problem has been very useful, thank you very much @alextechnology for doing it as well as the detective work you put in.

I will fix the number of occurrences for the WARC first so that you can give it a try.

alextechnology commented 8 months ago

@jnioche Thanks so much, not only for the fix but also the detailed explanation which helped fill in some gaps in my understanding of the codebase. Your explanation also aligns with what I was observing in the second worker - ie. that it would work for a bit before crashing on the concurrent mod error - ie. only when it finally hits a URL that needs serialization. Of course, in my case where the remote worker is writing to an NFS share hosted by the master, the overhead of serializing the content back to the master vs writing to the local NFS mount is about the same in the end but that's specific to my case and acceptable as far as I'm concerned.

I've merged in your latest commits, rebuilt the jar and deployed with two workers - and I can confirm it works! I need to let it run for a bit to get an idea of how the rate of content acquisition increases with 2 workers compared to 1 worker, but it does appear to be faster.

For our needs, your solution is fine but as you mentioned, it sort of skirts the actual problem of distributed concurrent updates to metadata. Perhaps you need to centralize this in something with locks or transactions (ie. Redis or Zookeeper) and/or make updates via a message queue (ie. Kafka)? Either way, given your generous assistance here, I'm happy to return the favor and test any solution. We actually have Zookeeper, Redis, Cassandra, Solr, and Kafka clusters (am already working on a Kafka Bolt add on so the WARC data or at least the site content is published into our analytics pipeline) already running so it's an easy test for us. Feel free to tag me if you want me to test.

Thanks again!

jnioche commented 8 months ago

Glad to hear that this is fixed

Perhaps you need to centralize this in something with locks or transactions (ie. Redis or Zookeeper) and/or make updates via a message queue (ie. Kafka)?

Usually, SC topologies are more linear and a tuples gets to the StatusUpdaterBolt only when it has gone through everything else. Ideally we would be to have the WARCBolt emit to the status steam. The potential problem with that is that the content is buffered before written to the output which introduces a delay, which compounds with the buffering in the status updater and could lead to timeouts. Also a lot of the logic is in the HDFS objects part of the Storm project and outside our control.

A simpler approach would be to clone Metadata before going to the status updater bolt. Adding another external component would make things unnecessarily cumbersome for most users.

I'll close this issue for now. Thanks again for reporting the problem and spending time investigating it.