Closed krishh-konar closed 2 years ago
Hi @krishh-konar a few questions to try and pinpoint where the issue lies.
It looks like horizon is NOT recieving metrics. I do not think this is a Redis issue, see my next comments below after this one.
/var/run/redis/redis.sock
? redis.conf should have:unixsocket /var/run/redis/redis.sock
unixsocketperm 777
REDIS_SOCKET_PATH = '/var/run/redis/redis.sock'
If none of the above steer you in the right direction, please could you paste more of your horizon log please, you can also jump on to https://gitter.im/earthgecko-skyline if that is quicker.
Hi @krishh-konar
The error suggests that there are no metrics processed by the worker, the metrics_received
list is empty which is why redis throws a wrong number of arguments for 'sadd' command
as sadd needs data to add members. How many metrics are you sending from Graphite? average queue size for the last 60 seconds - 0
also suggests that none are being received.
Could you stop horizon, move the horizon log file and start horizon again and then paste the first 1 or 2 full minutes of the log.
When you do that have a look at your Graphite carbon-relay log as well as that should tell you that the connection was dropped and then when horizon is started again the carbon-relay log should report that the connection was re-established.
Also if you look at your Graphite browser you should see metrics for the Skyline Horizon destination e.g. carbon.relays.YOUR_GRAPHITE_HOST-a.destinations.SKYLINE_IP:2014:None.sent
and fullQueueDrops
if Graphite is sending metrics to horizon you will see these metrics in Graphite and they will confirm if Graphite is sending or dropping.
Thanks for the reply. This is the redis version I'm using:
redis-server:
Installed: 5:5.0.14
I checked the redis confs, and the params are set correctly in both redis conf and settings.py in skyline. Im also seeing redis getting populated, so i think the connection/access is fine.
redis /var/run/redis/redis.sock> keys *
1) "analyzer.metrics_manager.resolutions"
2) "skyline.external_settings"
3) "aet.analyzer.unique_base_names"
4) "aet.analyzer.too_short"
5) "analyzer.boring"
6) "mirage"
7) "thunder.rolling"
8) "analyzer.unique_base_names"
9) "analyzer.metrics_manager.metrics_sparsity.avg_sparsity"
10) "analyzer.metrics_manager.metrics_sparsity.metrics_sparsity_increasing"
11) "thunder.last.luminosity"
12) "analyzer.too_short"
13) "analyzer.metrics_manager.metrics_sparsity.metrics_sparsity_decreasing"
14) "redis"
15) "thunder.last.redis"
16) "thunder.last.panorama"
17) "new_non_derivative_metrics"
18) "thunder"
19) "panorama"
20) "aet.analyzer.metrics_manager.metrics_sparsity_increasing"
21) "analyzer.metrics_manager.hash_key.metrics_data_sparsity"
22) "luminosity.cloudbursts"
23) "aet.analyzer.stale"
24) "analyzer.metrics_manager.metrics_sparsity_decreasing"
.......
Thanks for the reply. This is the redis version I'm using:
redis-server: Installed: 5:5.0.14
I checked the redis confs, and the params are set correctly in both redis conf and settings.py in skyline. Im also seeing redis getting populated, so i think the connection/access is fine.
redis /var/run/redis/redis.sock> keys * 1) "analyzer.metrics_manager.resolutions" 2) "skyline.external_settings" 3) "aet.analyzer.unique_base_names" 4) "aet.analyzer.too_short" 5) "analyzer.boring" 6) "mirage" 7) "thunder.rolling" 8) "analyzer.unique_base_names" 9) "analyzer.metrics_manager.metrics_sparsity.avg_sparsity" 10) "analyzer.metrics_manager.metrics_sparsity.metrics_sparsity_increasing" 11) "thunder.last.luminosity" 12) "analyzer.too_short" 13) "analyzer.metrics_manager.metrics_sparsity.metrics_sparsity_decreasing" 14) "redis" 15) "thunder.last.redis" 16) "thunder.last.panorama" 17) "new_non_derivative_metrics" 18) "thunder" 19) "panorama" 20) "aet.analyzer.metrics_manager.metrics_sparsity_increasing" 21) "analyzer.metrics_manager.hash_key.metrics_data_sparsity" 22) "luminosity.cloudbursts" 23) "aet.analyzer.stale" 24) "analyzer.metrics_manager.metrics_sparsity_decreasing" .......
Hi @krishh-konar
The error suggests that there are no metrics processed by the worker, the
metrics_received
list is empty which is why redis throws awrong number of arguments for 'sadd' command
as sadd needs data to add members. How many metrics are you sending from Graphite?average queue size for the last 60 seconds - 0
also suggests that none are being received.Could you stop horizon, move the horizon log file and start horizon again and then paste the first 1 or 2 full minutes of the log.
When you do that have a look at your Graphite carbon-relay log as well as that should tell you that the connection was dropped and then when horizon is started again the carbon-relay log should report that the connection was re-established.
Also if you look at your Graphite browser you should see metrics for the Skyline Horizon destination e.g.
carbon.relays.YOUR_GRAPHITE_HOST-a.destinations.SKYLINE_IP:2014:None.sent
andfullQueueDrops
if Graphite is sending metrics to horizon you will see these metrics in Graphite and they will confirm if Graphite is sending or dropping.
thanks, i'll try all these and hopefull it works 🤞 🤞
Hi @krishh-konar
The error suggests that there are no metrics processed by the worker, the
metrics_received
list is empty which is why redis throws awrong number of arguments for 'sadd' command
as sadd needs data to add members. How many metrics are you sending from Graphite?average queue size for the last 60 seconds - 0
also suggests that none are being received.Could you stop horizon, move the horizon log file and start horizon again and then paste the first 1 or 2 full minutes of the log.
When you do that have a look at your Graphite carbon-relay log as well as that should tell you that the connection was dropped and then when horizon is started again the carbon-relay log should report that the connection was re-established.
Also if you look at your Graphite browser you should see metrics for the Skyline Horizon destination e.g.
carbon.relays.YOUR_GRAPHITE_HOST-a.destinations.SKYLINE_IP:2014:None.sent
andfullQueueDrops
if Graphite is sending metrics to horizon you will see these metrics in Graphite and they will confirm if Graphite is sending or dropping.
I checked the relay logs and the connection is indeed being made.
22/12/2021 09:04:57 :: [console] Stopping factory CarbonClientFactory(10.21.0.81:2024:None)
22/12/2021 09:05:03 :: [console] Starting factory CarbonClientFactory(10.21.0.81:2024:None)
22/12/2021 09:05:03 :: [clients] CarbonClientFactory(10.21.0.81:2024:None)::startedConnecting (10.21.0.81:2024)
22/12/2021 09:05:03 :: [clients] CarbonClientProtocol(10.21.0.81:2024:None)::connectionMade
22/12/2021 09:05:03 :: [clients] CarbonClientFactory(10.21.0.81:2024:None)::connectionMade (CarbonClientProtocol(10.21.0.81:2024:None))
22/12/2021 09:05:03 :: [clients] Destination is up: 10.21.0.81:2024:None
I checked the graphite data as well to make sure there is no issue wih the metric itself, and I can see the graphs created using these metrics on grafana. I checked tcpdump to see if connection is actually being made with horizon, and I did see data there:
krishanu@node1:~$ sudo tcpdump -i any port 2024 and inbound
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
09:23:51.620031 IP node2.internal.47350 > node1.internal.2024: Flags [P.], seq 3260479491:3260480358, ack 1518300410, win 511, options [nop,nop,TS val 1149844013 ecr 2375717253], length 867
09:24:21.228514 IP node2.internal.47350 > node1.internal.2024: Flags [.], ack 1, win 511, options [nop,nop,TS val 1149873618 ecr 2375718365], length 0
09:24:51.657930 IP node2.internal.47350 > node1.internal.2024: Flags [P.], seq 867:1734, ack 1, win 511, options [nop,nop,TS val 1149904046 ecr 2375747970], length 867
09:25:01.676292 IP node2.internal.47350 > node1.internal.2024: Flags [.], ack 1, win 511, options [nop,nop,TS val 1149914064 ecr 2375778397], length 0
09:25:32.908335 IP node2.internal.47350 > node1.internal.2024: Flags [.], ack 1, win 511, options [nop,nop,TS val 1149945294 ecr 2375788414], length 0
I made the api call to see what metrics horizon has an I got the following response:
$ curl http://node1.internal/api?unique_metrics
{
"data": {
"metrics": [
"metrics.carbon.relays.node1-a.destinations.10_21_0_81:2024:None.relayMaxQueueLength",
"metrics.carbon.relays.node1-a.cpuUsage",
"metrics.carbon.relays.node1-a.memUsage",
"metrics.carbon.relays.node1-a.destinations.10_21_0_81:2024:None.attemptedRelays",
"metrics.carbon.relays.node1-a.blacklistMatches",
"metrics.carbon.relays.node1-a.metricsReceived",
"metrics.carbon.relays.node1-a.activeConnections",
"metrics.carbon.relays.node1-a.destinations.10_21_0_81:2024:None.queuedUntilConnected",
"metrics.carbon.relays.node1-a.destinations.10_21_0_81:2024:None.sent",
"metrics.carbon.relays.node1-a.destinations.10_21_0_81:2024:None.batchesSent",
"metrics.carbon.relays.node1-a.whitelistRejects"
]
},
"status": {
"cluster_data": false,
"filter_namespace": null,
"filtered": false,
"remove_full_namespace_prefix": false,
"request_time": 0.0006222724914550781,
"response": 200
}
}
I am probably making a basic mistake. Is there a prefix that our metric name must have or some other config changes that needs to be done? I am currently pushing data to graphite once per minute, I also tried decreasing the interval when I push the data. Do i need to make some other configuration changes?
I haven't made any changes to storage-schemas.conf
in carbon.
Thanks!
Hi @krishh-konar
That looks OK, so Horizon has received data for those metrics. How many metrics are you pushing to Graphite (not Horizon)? Meaning how many metrics does Graphite have? Are you pushing metrics to Graphite for other sources/machines? The metrics being reported above are Graphite's own metrics which it is pushing to Horizon, if it had more metrics that it was getting data for it would push those too. One exception is the Graphite carbon.cache
metrics, carbon-relay does NOT push those because carbon-cache records those directly to itself and they are not forwarded via carbon-relay.
Could you paste or attach your Graphite relay-rules.conf and carbon.conf (anonymised)
Your relay-rules.conf should look something like
[all]
pattern = .*
destinations = 0.0.0.0:2004, 10.21.0.81:2024
[default]
default = true
destinations = 0.0.0.0:2004:a, 10.21.0.81:2024:a
# WHERE 0.0.0.0:2004 is the local carbon-cache
Given your metrics I think that IP should be correct and I am assuming that carbon-cache is running on the same IP as carbon-relay. However given the list of metrics you pasted above it does not look like you are sending metrics to the local carbon-cache. We would expect to see metrics for the local carbon-cache too, something like carbon.relays.node1-a.destinations.0_0_0_0:2004:None.sent
carbon-relay should forward metrics to:
carbon-relay |-> carbon-cache
|-> Horizon
Figured out the issue on the discussion on Gitter, I was sending metrics directly to carbon-cache and not carbon-relay.
Sending metrics to :2013
instead of :2003
solved the issue.
Many thanks :)
While trying to set-up skyline, I am getting a redis exception in horizon logs. The redis server restarted once due to a server reboot.
I tried checking redis if those keys existed or not.
I have configured graphite to relay metrics to horizon, and I can see I'm recieving data using tcpdump, but I cant get any data into skyline.
Can someone help me with the redis issue?