ISISComputingGroup / IBEX

Top level repository for IBEX stories
5 stars 2 forks source link

TOSCA: trim database #3867

Closed Tom-Willemsen closed 5 years ago

Tom-Willemsen commented 5 years ago

The MySQL database is 8GB on Tosca:

Tom-Willemsen commented 5 years ago

Archiving rates:

IN:TOSCA:LKSH218_01:SENSOR2.VAL count=30        rate=1.00 Hz
IN:TOSCA:LKSH218_04:TEMP1.VAL   count=30        rate=1.00 Hz
IN:TOSCA:LKSH218_01:TEMP1.VAL   count=30        rate=1.00 Hz
IN:TOSCA:LKSH218_01:SENSOR1.VAL count=30        rate=1.00 Hz
IN:TOSCA:LKSH218_01:TEMP2.VAL   count=30        rate=1.00 Hz
IN:TOSCA:LKSH218_04:TEMP3.VAL   count=30        rate=1.00 Hz
IN:TOSCA:LKSH218_04:TEMP6.VAL   count=29        rate=0.97 Hz
IN:TOSCA:LKSH218_04:TEMP4.VAL   count=28        rate=0.93 Hz
IN:TOSCA:LKSH218_04:TEMP8.VAL   count=27        rate=0.90 Hz
IN:TOSCA:LKSH218_01:SENSOR3.VAL count=27        rate=0.90 Hz
IN:TOSCA:LKSH218_02:TEMP4.VAL   count=27        rate=0.90 Hz
IN:TOSCA:LKSH218_01:TEMP5.VAL   count=26        rate=0.87 Hz
IN:TOSCA:LKSH218_02:TEMP3.VAL   count=26        rate=0.87 Hz
IN:TOSCA:LKSH218_03:TEMP3.VAL   count=26        rate=0.87 Hz
IN:TOSCA:LKSH218_04:TEMP2.VAL   count=26        rate=0.87 Hz
IN:TOSCA:LKSH218_04:TEMP7.VAL   count=25        rate=0.83 Hz
IN:TOSCA:LKSH218_03:SENSOR3.VAL count=22        rate=0.73 Hz
IN:TOSCA:LKSH218_01:TEMP6.VAL   count=22        rate=0.73 Hz
IN:TOSCA:LKSH218_03:TEMP5.VAL   count=21        rate=0.70 Hz
IN:TOSCA:LKSH218_03:TEMP6.VAL   count=21        rate=0.70 Hz
IN:TOSCA:LKSH218_02:TEMP2.VAL   count=20        rate=0.67 Hz
IN:TOSCA:LKSH218_04:SENSOR4.VAL count=20        rate=0.67 Hz
IN:TOSCA:LKSH218_03:TEMP8.VAL   count=18        rate=0.60 Hz
IN:TOSCA:LKSH218_02:TEMP1.VAL   count=17        rate=0.57 Hz
IN:TOSCA:LKSH218_03:TEMP1.VAL   count=17        rate=0.57 Hz
IN:TOSCA:LKSH218_03:TEMP4.VAL   count=16        rate=0.53 Hz
IN:TOSCA:LKSH218_04:SENSOR1.VAL count=16        rate=0.53 Hz
IN:TOSCA:LKSH218_03:TEMP2.VAL   count=14        rate=0.47 Hz
IN:TOSCA:LKSH218_04:SENSOR8.VAL count=10        rate=0.33 Hz
IN:TOSCA:LKSH218_04:SENSOR3.VAL count=9 rate=0.30 Hz
IN:TOSCA:LKSH218_02:SENSOR1.VAL count=9 rate=0.30 Hz
IN:TOSCA:EUROTHRM_01:A02:TEMP.VAL       count=8 rate=0.27 Hz
IN:TOSCA:LKSH218_03:SENSOR7.VAL count=8 rate=0.27 Hz
IN:TOSCA:EUROTHRM_01:A02:RBV.VAL        count=8 rate=0.27 Hz
IN:TOSCA:LKSH218_03:TEMP7.VAL   count=8 rate=0.27 Hz
IN:TOSCA:LKSH218_04:SENSOR7.VAL count=7 rate=0.23 Hz
IN:TOSCA:DAE:GOODUAH.VAL        count=6 rate=0.20 Hz
IN:TOSCA:DAE:GOODFRAMES.VAL     count=6 rate=0.20 Hz
IN:TOSCA:DAE:RAWFRAMES.VAL      count=6 rate=0.20 Hz
IN:TOSCA:DAE:BEAMCURRENT.VAL    count=4 rate=0.13 Hz
IN:TOSCA:DAE:RAWFRAMES_PD.VAL   count=4 rate=0.13 Hz
IN:TOSCA:LKSH218_02:SENSOR4.VAL count=4 rate=0.13 Hz
IN:TOSCA:EUROTHRM_01:A01:TEMP.VAL       count=4 rate=0.13 Hz
IN:TOSCA:DAE:MONITORCOUNTS.VAL  count=4 rate=0.13 Hz
IN:TOSCA:DAE:GOODFRAMES_PD.VAL  count=4 rate=0.13 Hz
IN:TOSCA:EUROTHRM_01:A01:RBV.VAL        count=4 rate=0.13 Hz
IN:TOSCA:DAE:TOTALUAMPS.VAL     count=4 rate=0.13 Hz
IN:TOSCA:LKSH218_03:SENSOR1.VAL count=4 rate=0.13 Hz
IN:TOSCA:LKSH218_01:SENSOR5.VAL count=2 rate=0.07 Hz
IN:TOSCA:LKSH218_03:SENSOR4.VAL count=2 rate=0.07 Hz
IN:TOSCA:CS:SB:Temp1    count=2 rate=0.07 Hz
IN:TOSCA:LKSH218_03:SENSOR2.VAL count=2 rate=0.07 Hz
IN:TOSCA:DAE:GOODUAH_PD.VAL     count=1 rate=0.03 Hz
IN:TOSCA:DAE:TOTALCOUNTS.VAL    count=1 rate=0.03 Hz
IN:TOSCA:CS:SB:T6       count=1 rate=0.03 Hz
IN:TOSCA:CS:SB:Temp2    count=1 rate=0.03 Hz
IN:TOSCA:DAE:COUNTRATE.VAL      count=1 rate=0.03 Hz
IN:TOSCA:DAE:RUNDURATION_PD.VAL count=1 rate=0.03 Hz
IN:TOSCA:DAE:COUNTRATEFRAME.VAL count=1 rate=0.03 Hz
IN:TOSCA:CS:SB:T1       count=1 rate=0.03 Hz
IN:TOSCA:DAE:TOTALDAECOUNTS.VAL count=1 rate=0.03 Hz
IN:TOSCA:CS:SB:T5       count=1 rate=0.03 Hz
IN:TOSCA:DAE:RUNDURATION.VAL    count=1 rate=0.03 Hz

Nothing seems to be hugely abnormal here - temperatures logging at 1Hz.

ChrisM-S commented 5 years ago

More of an aside, but historically, we used a default 30s temperature logging interval so 1Hz seems a bit excessive for many/most? physical systems we log - e.g. cryogenics, water baths don't really change such that anything real would be lost at this rate. Heavily driven systems like furnaces and some others excepted obviously. If we defaulted to 30s (0.03Hz) we would cut logging and traffic down by an order of magnitude for no cost (30 times longer to fill the database to this level...).

KathrynBaker commented 5 years ago

Chris, are you talking blocks or from the device? Devices under LabVIEW are nearly always on a 100ms timeout, or similar delay. Blocks are usually slower. The items listed are the PVs not the blocks, so this is where we want to potentially keep track of any random spikes and waiting to report an out of range error for 30 seconds might be too long.

ChrisM-S commented 5 years ago

I'm just challenging the assumption that the science/control system needs to know anythings at this rate for slow temperature control (motion, furnaces being a different case). There is no reason that the IOC can't raise an alarm 1s after a limit is reached - but even in this case it could probably wait if it was cryogenics, the system could have been warming up for the 1/2 hour before this anyway. Polling for information which is only going to change very slowly is ultimately pointless!

KathrynBaker commented 5 years ago

The problem here is that these values should be logging on change (or so I believe) – so to limit the logging we have to limit the polling the way the system currently runs. So the IOC couldn’t raise an alarm any faster than it is logging the data. It may be that we can introduce tolerances to limit the logging, but I don’t want to have to keep track of when the Eurotherm IOC is being used for a furnace or not – whilst there are mechanisms to allow this, it provides complexity for the user, for limited gain on our side compared to managing the database differently.

FreddieAkeroyd commented 5 years ago

TOSCA has an overall archive rate of 27Hz, according to nagios, other instruments only have a few Hz. It is a lot of smalll things adding up. Do we need to change archive deadbands?

https://varanus.nd.rl.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=epics_archive&style=detail

Tom-Willemsen commented 5 years ago

Keeping temperatures at this rate can be useful for diagnostics later (e.g. to spot an oscillating pid controller). Is it time to increase the limit of what we consider "too much" logging?

FreddieAkeroyd commented 5 years ago

I guess we really want to look at logging rate, 8GB after a whole cycle is not a problem, 8GB after the first day is!

ChrisM-S commented 5 years ago

you got there first @FreddieAkeroyd !

Probably good to unwind a level here, TOSCA has only archived about 62MB of (text) log files in it's scientific data in total (since it's IBEX existence began!). of significance are one on 5th Nov of 25MB and 8MB on the 11th, two others around 1MB. So I guess the question should shift to where did the 8.5GB database come from given the logging recorded?

ChrisM-S commented 5 years ago

Rather than leave a loose end, I agree with @Tom-Willemsen but would phrase the the question more as how do we define "sufficient" logging - we should definitely do this completely and make storage available for it - how much we log beyond what is sufficient is a moot point.

John-Holt-Tessella commented 5 years ago

I think we either need to:

  1. swap to archive appliance and see if it is less memory hungry
  2. Start dumping the table when it gets to a 1GB an exporting it to a central area and reconstructing the data
Tom-Willemsen commented 5 years ago

We decided not to do this ticket - Tosca will survive until the end of cycle as-is. The discussions above are better placed in a different ticket, so I will close this one.