Open e3rd opened 6 years ago
see also #361 There was the idea to create the statistical data in/after the process() call - to differentiate between success and failure. Another (additional) idea is to just save timestamps there to keep the processing overhead in the bot as low as possible. The the rate is then calculated "on demand" (allows rates for different timespans, graphs etc)
We merged first draft of queue traffic into develop. You wrote:
However I am not sure if it is worth now to implement it in the Manager, maybe we can find some optimizations in the way we collect or store the data, which would need some changes.
When you say it seems ready to implement in the Manager, I propose to implement it this way.
Model:
## Model
bot-id
-> _default -> another-queue -> another-bot
-> alter-path -> alternative-queue -> alternative-bot
## Queues
bot-id-queue: 103
another-bot-queue: 104
alternative-bot-queue: 105
## Piping
bot-id is piping 3 messages via alter-path, in the past it piped 12 messages that way
## Status
bot-id had 50 successes and 13 failures in message processing.
Current output of intelmqctl --type json list queues
:
# bot statuses
{"bot-id": "running"}
# bot information
{"bot-id": {"source_queue": ["bot-id-queue", 103],
"internal_queue": 0,
"destination_queues": [
["another-queue", 104],
["alternative-queue", 105]
]
}
},
Proposal output of intelmqctl --type json list queues
:
# bots
{"bot-id":
{"source_queue": 103 , // there is single source queue, no need to list the name "bot-id-queue"
"internal_queue": 0,
"status": running, // merged from bot statuses
"destination_queues": { // ignore queue values – other bots have them in source queues
"_default": ["another-bot-queue"],
"alter-path": ["alternative-bot-queue"]
},
"success": 50, // total number since the beginning
"failure": 13, // total number since the beginning
"stats": {
"_default": [0, 0], // temporary, total
"alter-path": [3, 12]
}
}
}
Concerning the usefulness: What the data does currently not keep is any time information (in redis), so we can't get any rate - neither since the bot start nor for the last 5 minutes/1 hour etc. That was actually the idea in #361
This is a screenshot from our monitoring system which I used to plot the values: We always only know the absolute number - which decreases, and with the restart it is reset. One could with external tools compute the rate of course, but I think we are aiming for a standalone solution.
I think that the rate would be interesting. So you can see the spikes in bots - ie. when the bot is processing slower then the previous one(s). What do you think?
What's the reason statistics are in the DB=3? We'll have to connect to both DB=2 and DB=3 in intelmqctl.
To not mix the data, makes it a lot easier of you work on the redis database itself. For example, you can list all non-emtpy queues with keys *
. Actually, having both data in the same database, broke intelmqctl check
which did exactly this. Further you can get rid of all stats/queues by just flushing the (one) database.
Also, the pipeline can be non-redis. Then we can't rely on the pipeline to be redis. It works currently if the pipeline is AMQP, but the support of this use-case can still be improved in this regard (e.g. setting an explicit host for the statistics).
Are there some performance advantages or so?
Don't think so, as both connections are open anyway the whole time. Anyway, the (non-performance) advantages are big enough to ignore potential small performance decreases.
I would put this into the statusd
component in intelmq 3.0
I'd like to discuss displaying queue traffic in manager. It would be cool we see what pipes had recently transferred some messages on the visjs edges in manager. There are plenty of ways this can be done. There is one solution I made up last night but I need to discuss the intelmq background first.
defaults.conf
value: "count messages in queues" on/offpipeline.py
: For other message passed, write to the registry that there was a connection right now between the bot and a destination queue.intelmqctl --type json list queues-and-statuses --last-checked TIMESTAMP
would return also the list of connections between a bot and a queue that happened since TIMESTAMP. With this info provided, Manager / Configuration tab could nicely pulsate the active edges.Downside: Isn't it a performance issue to perform
lset
for every parsed message (lpush
)? What do you think?