archiver-appliance / epicsarchiverap

This is an implementation of an archiver for EPICS control systems that aims to archive millions of PVs.
Other
38 stars 37 forks source link

Broadcast storm during first DisconnectChecker run #133

Closed nikitakuklev closed 2 years ago

nikitakuklev commented 2 years ago

Our archiver instance (10k active/60k total PVs) started triggering broadcast storms after server upgrade. This happens once at 20 minutes after startup. I traced this to the connection of all the meta channels on first run of DisconnectChecker. Relevant log fragment:

2022-02-07 22:51:10,086 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EngineContext  - L5:P4:BPM.BK3 is connected. Seeing if we need to start up the meta channels for the fields.
2022-02-07 22:51:10,086 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.model.ArchiveChannel  - L5:P4:BPM.DESC connected is false
2022-02-07 22:51:10,086 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EngineContext  - S35I:DG1:trigInputAmpSetAO is connected. Seeing if we need to start up the meta channels for the fields.
2022-02-07 22:51:10,086 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.model.ArchiveChannel  - S35I:DG1:trigInputAmpSetAO.DRVH connected is false
2022-02-07 22:51:10,086 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EngineContext  - L5:P4:BPM.BK0 is connected. Seeing if we need to start up the meta channels for the fields.
2022-02-07 22:51:10,086 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.model.ArchiveChannel  - L5:P4:BPM.DESC connected is false
2022-02-07 22:51:10,090 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EngineContext  - Starting meta channels for PTB:PV1:BPM.NSAM
2022-02-07 22:51:10,090 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.model.ArchiveChannel  - Starting up monitors on the fields for pv PTB:PV1:BPM.NSAM
2022-02-07 22:51:10,092 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofPTB:PV1:BPM.DESC connectting
2022-02-07 22:51:10,092 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.model.ArchiveChannel  - Done starting up monitors on the fields for pv PTB:PV1:BPM.NSAM
2022-02-07 22:51:10,093 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EngineContext  - Starting meta channels for LI:VD1:y:fit:cal:sigmaM
2022-02-07 22:51:10,093 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.model.ArchiveChannel  - Starting up monitors on the fields for pv LI:VD1:y:fit:cal:sigmaM
2022-02-07 22:51:10,093 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofLI:VD1:y:fit:cal:sigmaM.HIHI connectting
2022-02-07 22:51:10,094 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofLI:VD1:y:fit:cal:sigmaM.HIGH connectting
2022-02-07 22:51:10,094 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofLI:VD1:y:fit:cal:sigmaM.LOW connectting
2022-02-07 22:51:10,094 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofLI:VD1:y:fit:cal:sigmaM.LOLO connectting
2022-02-07 22:51:10,094 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofLI:VD1:y:fit:cal:sigmaM.LOPR connectting
2022-02-07 22:51:10,094 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofLI:VD1:y:fit:cal:sigmaM.HOPR connectting
2022-02-07 22:51:10,094 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.pv.EPICS_V3_PV  - pv ofLI:VD1:y:fit:cal:sigmaM.DESC connectting
2022-02-07 22:51:10,094 [Engine scheduler for misc tasks.] DEBUG org.epics.archiverappliance.engine.model.ArchiveChannel  - Done starting up monitors on the fields for pv LI:VD1:y:fit:cal:sigmaM
[....many thousands on connections]

I believe METACHANNELS_TO_START_AT_A_TIME can be used to throttle this process - can it be made into a configurable settings, or some other throttling mechanism added? Also, a potential optimization could be to skip broadcast searches and connect directly to IOC IP of main channel.

slacmshankar commented 2 years ago

Good point. Will expose this in archappl.properties