Closed pininy closed 4 years ago
Please post your poller settings, your default timeouts (which I assume you have against each devices) and any relevant logs that show a poller cycle for the issue. If these are fairly big, you can email them to developers@cacti.net
Date | Mon, 25 May 2020 11:21:40 +0000 Cacti Version | 1.2.3 Cacti OS | unix RSA Fingerprint | 23:f6:f4:a4:c3:d4:80:7f:ef:2f:b4:43:9d:93:f3:f5 NET-SNMP Version | NET-SNMP version: 5.7.3 RRDtool Version Configured | 1.7.0+ RRDtool Version Found | 1.7.0 Devices | 225 Graphs | 1789 Data Sources | Script/Command: 10SNMP Get: 772SNMP Query: 1257Script Server: 22Total: 2061
Interval | 300 Type | cmd.php Items | Action[0]: 3271Action[1]: 10Action[2]: 22Total: 3303 Concurrent Processes | 50 Max Threads | 50 PHP Servers | 50 Script Timeout | 25 Max OID | 10 Last Run Statistics | Time:298.5692 Method:spine Processes:1 Threads:1 Hosts:224 HostsPerProcess:224 DataSources:3303 RRDsProcessed:0
SNMP Defaults Version Community IwDfua69R Port Number 161 Timeout 500 Retries 3
just found out the - Data Collector Information only have 1 process and it the log shows exceeding
should i increase to 5 or more processes?
"Maximum runtime of 298 seconds exceeded. Exiting."
if you're using spine, there's a recommendation to this. Take a look at this page: https://github.com/Cacti/documentation/blob/develop/Spine.md if you're using php poller, then increasing the number of processes can help.
Hi,
It looks like increasing the php poller processes help, is there a recommendation for that? Currently i am on 5 processes.
Also the cacti installation was done via apt with default values if i do another spine installation it did almost screwed up the configuration.
I will try to install https://github.com/Cacti/documentation/blob/develop/Spine.md later after monitoring the changes on the processes it's seems to getting back to normal.
will monitor for a day or two.
best regards,
On Mon, May 25, 2020 at 9:27 PM Robert von Könemann < notifications@github.com> wrote:
if you're using spine, there's a recommendation to this. Take a look at this page: https://github.com/Cacti/documentation/blob/develop/Spine.md if you're using php poller, then increasing the number of processes can help.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/3575#issuecomment-633572272, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOJEYT4GNGLSOGZOQZEXN3RTJW27ANCNFSM4NJJHF3A .
Looking over your logs it looks like you’ve also got some mac track errors
Yes,
Lots of them, been trying the new plugin, yet i only just started with a few devices 5-6.
regards,
On Tue, May 26, 2020 at 3:23 AM Mark Brugnoli-Vinten < notifications@github.com> wrote:
Looking over your logs it looks like you’ve also got some mac track errors
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/3575#issuecomment-633691608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOJEYWR7HYYKVONJOYKISTRTLASLANCNFSM4NJJHF3A .
I tend never to use cmd.php for a real world deployment. I only use spine. I frankly surprised with less than 300 devices it's taking almost 5 minutes though. Once you get spine up, generally, it's the number of threads and processes really depend on the latency between the server and the systems, and the number of cores and memory you have on the Cacti server.
If you have a tiny two core VM, you are not going to be able to have 30-40 threads collecting, regardless of latency. You just have to watch top as the poller runs and see. There are a number of larger Cacti systems that I see that have witnessed with in excess of 10k hosts. And 2-4 processes with 20 - 40 processes each is more than enough for a one minute collection frequency. Now the systems doing the collection are big boxes, and the data is real close. So, it's like cheating the laws of physics.
But back in 2006, I had a big distributed classic NOC type setup monitoring about 4k switches and routers with only 8 cores and 32GB of memory using spine, and a big MacTrack setup that spun up 50 concurrent processes. You'll figure it out.
Good advise, I have always had both installed. Can't recall why spine wasn't installed or incorrectly this time round.
Will do a db backup before the spine.
thanks!
On Tue, May 26, 2020 at 5:16 AM TheWitness notifications@github.com wrote:
I tend never to use cmd.php for a real world deployment. I only use spine. I frankly surprised with less than 300 devices it's taking almost 5 minutes though. Once you get spine up, generally, it's the number of threads and processes really depend on the latency between the server and the systems, and the number of cores and memory you have on the Cacti server.
If you have a tiny two core VM, you are not going to be able to have 30-40 threads collecting, regardless of latency. You just have to watch top as the poller runs and see. There are a number of larger Cacti systems that I see that have witnessed with in excess of 10k hosts. And 2-4 processes with 20 - 40 processes each is more than enough for a one minute collection frequency. Now the systems doing the collection are big boxes, and the data is real close. So, it's like cheating the laws of physics.
But back in 2006, I had a big distributed classic NOC type setup monitoring about 4k switches and routers with only 8 cores and 32GB of memory using spine, and a big MacTrack setup that spun up 50 concurrent processes. You'll figure it out.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/3575#issuecomment-633718559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOJEYRYJF2DYZ4Q6SMZCCTRTLN3NANCNFSM4NJJHF3A .
Great. Closing now.
Describe the bug
After provision more than 200 devices the problems starts to show, initially thought it was between the devices SNMP. but intermittent drops and did a debug on the graph interfaces - showed ERROR: opening '/opt/cacti/rra/XXXX.rrd': No such file or directory.
Now i have 30 out of 206 devices showing unknown and i can't seems to add more graph data sources are not being created.
thomas@cactiserver:/opt/cacti/rra$ ls | wc -l 1728 1728 actual on the directory and 2061 are showing on CACTI which some are not created.
To Reproduce
RRDtool Command: /usr/bin/rrdtool graph - \ --imgformat=PNG \ --start='-86400' \ --end='-300' \ --pango-markup \ --title='SINT01-CM0101 - Traffic - Gi0/28' \ --vertical-label='bits per second' \ --slope-mode \ --base=1000 \ --height=120 \ --width=500 \ --rigid \ --alt-autoscale-max \ --lower-limit='0' \ COMMENT:"From 2020/05/24 07\:21\:35 To 2020/05/25 07\:16\:35\c" \ COMMENT:" \n" \ --color BACK#F3F3F3 \ --color CANVAS#FDFDFD \ --color SHADEA#CBCBCB \ --color SHADEB#999999 \ --color FONT#000000 \ --color AXIS#2C4D43 \ --color ARROW#2C4D43 \ --color FRAME#2C4D43 \ --border 1 --font TITLE:11:'Arial' \ --font AXIS:8:'Arial' \ --font LEGEND:8:'Courier' \ --font UNIT:8:'Arial' \ --font WATERMARK:6:'Arial' \ --slope-mode \ --watermark 'Generated by Cacti®' \ DEF:a='/opt/cacti/rra/sint01-cm0101_traffic_in_1692.rrd':'traffic_in':AVERAGE \ DEF:b='/opt/cacti/rra/sint01-cm0101_traffic_in_1692.rrd':'traffic_out':AVERAGE \ CDEF:cdefa='a,8,' \ CDEF:cdefbb='b,8,' \ CDEF:cdefbc='b,-8,*' \ CDEF:cdefcd='a,UN,INF,UNKN,IF' \ AREA:cdefa#00FF0019: \ AREA:cdefa#0000FF19: \ LINE1:cdefa#00CF00FF: \ AREA:cdefa#00CF00FF:'Inbound ' \ LINE1:cdefa#0000FFFF:'Inbound\: ' \ GPRINT:cdefa:LAST:'Current\:%8.2lf %s' \ GPRINT:a:LAST:'Current\:%8.2lf %s' \ GPRINT:cdefa:AVERAGE:'Average\:%8.2lf %s' \ GPRINT:a:AVERAGE:'Average\:%8.2lf %s' \ GPRINT:cdefa:MAX:'Maximum\:%8.2lf %s\n' \ GPRINT:a:MAX:'Maximum\:%8.2lf %s' \ LINE1:cdefbb#002A97FF:'Outbound ' \ AREA:cdefbc#4444FF19: \ COMMENT:'Total IN\: 0.00 B\n' \ LINE1:cdefbb#002A97FF: \ GPRINT:cdefbb:LAST:'Current\:%8.2lf %s' \ AREA:cdefbb#00BD2719: \ GPRINT:cdefbb:AVERAGE:'Average\:%8.2lf %s' \ LINE1:cdefbb#00BD27FF:'Outbound\:' \ GPRINT:cdefbb:MAX:'Maximum\:%8.2lf %s' \ GPRINT:b:LAST:'Current\:%8.2lf %s' \ VRULE:00#9FA4EEFF: \ GPRINT:b:AVERAGE:'Average\:%8.2lf %s' \ AREA:cdefcd#8F9286FF: \ GPRINT:b:MAX:'Maximum\:%8.2lf %s' \ COMMENT:'Total OUT\: 0.00 B' RRDtool Says: ERROR: opening '/opt/cacti/rra/host_traffic_in_1692.rrd': No such file or directory A clear and concise description of what you expected to happen.
I am suspecting maybe loading? or not enough poller resources, trying to find the right settings.
Add any other context about the problem here.