Open CygnusAlpha opened 7 years ago
Can you attach the full run log from start to finish, and a copy of your config with any credentials blanked out?
…because whatever generated that is hateful
also are you using etcd
as suggested in that note, or are you just configuring it statically? because the latter would be much easier, I'd imagine. we're not using etcd in production at all.
Steps: git clone git@repo.ch.internal:oss/acropolis.git --recursive cd acropolis git checkout 6d88b7d docker-compose up docker-compose scale anansi=2 docker-compose run anansi crawler-add http://sws.geonames.org/2643743/
It's possible (quite likely) that I misconfigured it or am running it wrongly of course.
"also are you using etcd as suggested in that note, or are you just configuring it statically? because the latter would be much easier, I'd imagine. we're not using etcd in production at all."
I'm second guessing how to configure it. I found limited documentation about it and this was based on what was in anansi/crawler/crawl.conf.
A proper config example mirroring whats on live would be good.
it's configurable precisely because what's useful on live and what's useful for day-to-day development are generally not the same
what live does though, is to give it a cluster registry URI that matches the queue database URI (for the moment, it will change again on AWS to use a different database).
Oh, actually https://github.com/bbcarchdev/anansi/wiki
I have configured the crawld so: (getting rid of etcd)
1 [crawler]
2 detach=no
3 verbose=no
4 threads=1
5
6 [cluster]
7 name=anansi
8 registry=pgsql://postgres:postgres@postgres/anansi
9 environment=development
10
11
12 [processor]
And run a second crawld. ( dc scale anansi=2 )
Here is the log which eventually aborts with:
anansi_1 | crawld[1]: processor_handler: following 303 redirect to <http://dbpedia.org/data/Coopers_School.xml>
anansi_1 | crawld[1]: Adding URI <http://dbpedia.org/data/Coopers_School.xml> to crawler queue
anansi_1 | crawld[1]: libcluster: SQL: this instance is no longer a member of anansi/development
anansi_1 | crawld[1]: libcluster: re-balanced; this instance has base index -1 (1 workers) from a total of 0
anansi_1 | crawld[1]: %ANANSI-N-2011: cluster has re-balanced: instance faac917d46c047d294d095da703ce3b7 has left cluster anansi/development
anansi_1 | crawld[1]: %ANANSI-N-2030: crawl thread suspended due to re-balancing [development] crawler 2/2 (thread 1/1)
anansi_1 | crawld[1]: %ANSNSI-E-5005: SQL error [22012]: ERROR: division by zero
anansi_1 |
aha
the
anansi_1 | crawld[1]: %ANANSI-N-2030: crawl thread suspended due to re-balancing [development] crawler 2/2 (thread 1/1)
line looks key - it would seem there's a race-condition there, where the instance ID is being changed in the crawl context itself before it has a chance to suspend and not use it any more.
that said, unless a signal's been received, this shouldn't happen:
anansi_1 | crawld[1]: libcluster: SQL: this instance is no longer a member of anansi/development
it'd be good to see the SQL query which triggered this; given we're trying to diagnose wtf is going on, I'd set verbose=yes
in both the [crawler]
and [cluster]
sections
[removed messed up log... It's uploaded below]
Which looks like this is the culprit:
"res"."tinyhash" % 0 =
So, probably the cluster thinks there are 0 nodes.
Commit e8331cb, probably result of: e13639d - Perform dequeue operation within a transaction..
Internal tracking: RESDATA-1129