archiver-appliance / epicsarchiverap

This is an implementation of an archiver for EPICS control systems that aims to archive millions of PVs.
Other
39 stars 39 forks source link

sometimes epicsAA crush down #137

Open RuizheIsaf opened 2 years ago

RuizheIsaf commented 2 years ago

@slacmshankar hello everyone, i met a problem. i have implemented and used AA normally for perhaps 8 months, it works well all the time. Recently i found that there are too many logs(perhaps 200Gb...) , so i added a crontab mission as the figure show in order to delete logs every week. But after that my AA often crush down. when it crush down, i can not do any operation just remain a web what's more all the data that occurs in this time are lost. some details of my AA server are shown in figures. thank for your help in advance! 3461657610310_ pic 3471657610351_ pic 3481657610492_ pic WechatIMG351

aawdls commented 2 years ago

I think the problem may be because your cron job is deleting all log files, including the one which is currently being written. I can imagine that deleting the file which is currently being written will cause problems for the process.

The archiver appliance uses log4j for logging. You can provide your own log4j.properties file and configure a RollingFileAppender or DailyRollingFileAppender.

We have the following log4j.properties. It can probably be improved but it works okay. I've removed some site-specific details of additional loggers.

log4j.rootLogger=INFO, FILE, STDOUT

# log to STDOUT (standard tomcat log)
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
log4j.appender.STDOUT.Threshold=ERROR
log4j.appender.STDOUT.layout=org.apache.log4j.PatternLayout
log4j.appender.STDOUT.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

# write a log file, as well
log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=${catalina.home}/logs/THISAPP.log
log4j.appender.FILE.MaxFileSize=10MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
RuizheIsaf commented 2 years ago

@aawdls thanks for your reply unfortunately, today 8:00am AA crushed down again...so i think perhaps that it is not the problem of logs... I try to find problems by checking the logs of retrieve webapp and there are not ERROR info or some useful logs. In the past, i reloaded the service "systemctl restart epicsarchiverap" and the question can be solved, but recently it often crushes down almost everyday...i think that disk and network are normal , do you know how to find the problem? really thanks in advance

截屏2022-07-14 上午10 18 11