Open John-Holt-Tessella opened 5 years ago
From: ICP Sent: 16 November 2018 00:20 To: Instrument Control Project List Subject: [ICP] What is going on with those odd periods in the data
Hi all,
This was a little long for a slack post, so I’ve opted for an email. There was a bit more of hunting through the data relating to the ‘sawtooth’ we see on archive data, and the answer is it looks like there is something either off in how the data is written, or in the query.
I ran a query along the lines of select from archive.sample where channel_id is the pv we are interested in, and sample_id > 875966, and I ended up with the same periodic artefact, whilst an earlier query that I had done was missing it, the difference there was that I was trying to be quick, and to get the data I was interested in I’d ordered by the sample number.
Plotting these (via SQL workbench to a csv file to LabVIEW) I could plot the following, the white is the unordered and the red is ordered:
The data in the simple select has these jumps in the data, so Freddie and I got to the educated guess of the log plotter not ordering, or using the timestamps when plotting the data.
The fix for this might be to look at why it is recording them in broken out of sequence blocks, or look at why the display doesn’t use the timestamps…
Regards, Kathryn
OK so the query you used looked something like:
SELECT c.name, severity_id, status_id, num_val, float_val, str_val, array_val, smpl_time, s.nanosecs
FROM archive.sample s
JOIN archive.channel c ON c.channel_id = s.channel_id
WHERE s.channel_id in (
SELECT channel_id
FROM archive.channel c
WHERE name in ("IN:EMMA-A:MOT:MTR0104.RBV"))
AND s.smpl_time > "2018-11-15T16:01:28"
AND s.smpl_time <="2018-11-15T16:03:00"
ORDER BY s.smpl_time, s.nanosecs
It turns out the the script above that produces smooth data does the query
SELECT c.name, severity_id, status_id, num_val, float_val, str_val, array_val, smpl_time, s.nanosecs
FROM archive.sample s
JOIN archive.channel c ON c.channel_id = s.channel_id
WHERE s.channel_id in (
SELECT channel_id
FROM archive.channel c
WHERE name in ("IN:EMMA-A:MOT:MTR0104.RBV"))
AND s.smpl_time > "2018-11-15T16:01:28"
AND s.smpl_time <="2018-11-15T16:03:00"
ORDER BY s.smpl_time
So then we can probably conclude that it is the nanosecs column which is not getting written to correctly in the database as the data is being archived.
If you look at sample id vs sample time and nanosecs you see that it looks like sample time rounded to the nearest second which means that the nanosecs for any particular time goes from 5e9 -> 10e10 -> 0 > 5e9.
The right fix might be:
ALTER TABLE sample MODIFY COLUMN smpl_time timestamp(2);
But this could double count the nano-seconds.
So we may need this instead (https://dev.mysql.com/doc/refman/8.0/en/fractional-seconds.html):
SET @@sql_mode = sys.list_add(@@sql_mode, 'TIME_TRUNCATE_FRACTIONAL');
The best fix is probably to truncate it as we put the data into the archive.
Prior to MySQL 5.6 fractional seconds were not supported in DB time fields and the fractional part was truncated on insert, so all would have worked fine then. Since 5.6 MySQL has supported up to 6 digits in fractional seconds (but defaults to 0 for backward compatibility), but now rounds rather than truncates (not for backward compatibility!) which causes shifts in the data as we have seen. 6 digits doesn't cover nanosecond resolution, so the nanosecond column will still be needed. As mentioned above, fixing the archive engine to truncate is the correct solution - sql_mode can bet set on a per session basis with SET SESSION sql_mode = 'modes'; but probably better to change the INSERT statement in the archive engine which would cover all databases.
Just for completeness, my queries were incredibly simple:
The query for the data with the periodicity was actually as simple as: SELECT float_val FROM archive.sample where channel_id = 1258 and sample_id > 875966
The smooth data I originally got was limited to 10000 rows (which is what got me the sample_id above) and was as follows: SELECT float_val FROM archive.sample where channel_id = 1258 order by sample_id desc
Data from the archive plotted in log plotter is not smooth for motors (we have not seen it with any other data). The data has steps in which are not in the database.
data plotted after getting it from the archive is:
%PYTHON% log_file_generator.py --start_time="2018-11-15T16:01:20" --delta_time=0.01 --point_count=10000 --host NDXEMMA-A "MOT1" "IN:EMMA-A:MOT:MTR0104.RBV"