Front end clock drift - Githubissues

awattsFNAL commented 9 months ago

https://www-bd.fnal.gov/Elog/?orEntryId=248251

awattsFNAL commented 9 months ago

beauremus commented 9 months ago

I want to reiterate explicitly here that Ops, in the Slack thread, suggested that data logging the J: values would be useful in debugging. This is something that Controls should look into since it will likely require instantiating a new logger. Let me know if I should turn this into a feature request.

beauremus commented 9 months ago

Doing some searches across front-end config files, I find that these are the clock modules for the FEs mentioned in Slack.

MILLRF - PMCUCD
MECAR - PMCUCD
MRF - PMCUCD
MI60S3 - Multicast UCD

beauremus commented 9 months ago

grep is used to search for a pattern in a file.
- -l means we only want the file names, not the lines that contain the pattern.
- -r means we search recursively in the current folder and all sub-folders.
- "sld\|ucd" is the pattern we search for.
- /fecode-bd/vxworks_boot/fe is the folder we search in.
| grep -E is used to search for a pattern in a file using regular expressions.
- \.(startup|cmd|vx|login)$ is the pattern we search for.
- . means any character.
- (startup|cmd|vx|login) means one of the strings startup, cmd, vx, login.
- $ means the end of the line.
| xargs is used to pass the output of one command as arguments to another command.
- grep -H "sld\|ucd" is the command we pass the output of the previous command to.
- -H means we want to print the file name for each match.

From @rneswold, here are the available types:

SLD -> "sld-*.out", IP-UCD -> "libiptrig-*.out", PMCUCD -> "libpmctrig-*.out", VUCD -> "libvucdtrig-*.o*", Multicast TCLK -> "libmctrig-*.o*"

Unfortunately, not all FE maintainers follow this convention, so doing a similar search in the relevant directory in /fecode-bd/vxworks_boot/fe can get you what you need.

Thanks a ton to @rneswold for getting me 98% of the way there. 🦾

awattsFNAL commented 8 months ago

From @kengell:

I’m working on a clock drift problem where the VME based front-ends (e.g. MCR01) exhibit a once per hour clock reset of about 30 milliseconds (see DATALOGGER plot). I received a call from the MCR and they reported a ‘Network Clock Storm’ By that, the MCR means a whole lot of J: devices (e.g. J:MCR01, J:CLX44E) alert them to slow FE response times. My understanding is that the MONITR FE (java based) pings the VME and ACSys FEs periodically and reports latency in the J: devices. What I find odd is that the VME based FEs appear to have a 1-hour clock drift/reset (see plot). I do not observe that behavior on the ACSys FEs (J:CLX44E). Anyone know why the VME based FEs exhibit this one hour rise/fall of ping latency? Thanks.

https://files.slack.com/files-pri/THF7S17RV-F063TUA6YQ2/screenshot_2023-11-01_at_11.01.41_am.png

awattsFNAL commented 8 months ago

Rich It’s hard to generalize. The ACSys front-ends are on Linux, which uses NTP to keep the clocks in sync. NTP is only supposed to speed up or slow down the clock interrupt so time is always increasing and it eventually keeps in sync with the time server. In VxWorks, we use the $8F (GPS) event to sync our system’s one-second boundary. This means time can briefly go backwards, if the clock was ahead. Each FE syncs its time differently, based on which TCLK decoder library they’re using. In addition, VxWorks startup scripts may start a periodic, background task that syncs with another system. The 6.x kernels have a simple, NTP client function which simply sets the system time, so it can jump forward or backwards. Your plot looks as though the system time drifts and you have a once-an-hour task that resets it. (edited)

Dennis Nicklaus It isn't VME in general, but it is more about MCR01 in particular. Here's a similar plot with a different VME front end, CMTIL1, with a much more stable clock. I don't know off the top of my head if it is a 5.4 thing, a 6040/162 thing, or just how mcr01 is configured. https://files.slack.com/files-pri/THF7S17RV-F063NF85UDC/screenshot_2023-11-01_at_2.31.58_pm.png

fermi-ad / controls

Front end clock drift #20