dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

Alarm email service #2578

Closed Xinliliu closed 8 years ago

Xinliliu commented 8 years ago

On dcache 2.13.25, SL6.7, oracle java 1.8.0_25-b17.

Without email function enabled, alarm works fine. But not when enable email service. alarms.enable.email= true alarms.email.threshold=critical alarms.email.smtp-host=localhost alarms.email.to=test@localhost alarms.email.from=test@localhost

restarted alarm domain and tried. dcache alarm send "this is an alarm mail test" Sent alarm to localhost:9867.

No email, no alarm logged in logentry, see error in alarm.log

2016-06-21 14:54:53 Launching /usr/bin/java -server -Xmx1024m -XX:MaxDirectMemorySize=512m -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=20000,25000 -Dorg.dcache.dcap.port=0 -Dorg.dcache.net.tcp.portrange=33115:33145 -Dorg.globus.jglobus.delegation.cache.lifetime=30000 -Dorg.globus.jglobus.crl.cache.lifetime=60000 -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/etc/dcache/jgss.conf -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dcache/alarm-oom.hprof -XX:+UseCompressedOops -javaagent:/usr/share/dcache/classes/aspectjweaver-1.8.7.jar -Djava.awt.headless=true -DwantLog4jSetup=n -Djdk.tls.ephemeralDHKeySize=2048 -Ddcache.home=/usr/share/dcache -Ddcache.paths.defaults=/usr/share/dcache/defaults org.dcache.boot.BootLoader start alarm 21 Jun 2016 14:55:05 (alarms) [] Uncaught exception in thread pool-2-thread-1 java.lang.NullPointerException: null at org.dcache.alarms.logback.LogEntryHandler.setType(LogEntryHandler.java:384) ~[dcache-core-2.13.25.jar:2.13.25] at org.dcache.alarms.logback.LogEntryHandler.access$400(LogEntryHandler.java:101) ~[dcache-core-2.13.25.jar:2.13.25] at org.dcache.alarms.logback.LogEntryHandler$LogEntryTask.run(LogEntryHandler.java:140) ~[dcache-core-2.13.25.jar:2.13.25] at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:241) ~[dcache-core-2.13.25.jar:2.13.25] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_25] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_25] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]

Simon

alrossi commented 8 years ago

Thanks for reporting this, Simon. As I said, I can reproduce this on our systems here. The code has not changed between 2.12 and 2.13, and we have used 2.12 to send alarm emails. so this needs investigation. I'm on it.

alrossi commented 8 years ago

Somehow 2.13 is not handling an alarm sent without the type being expressed.

This seems to work:

dcache alarm send -t=FATAL_JVM_ERROR "Test of the email system 1"

alrossi commented 8 years ago

Looks like an issue with the MDC properties not getting set on the event.

Did we change logback versions between 2.12 and 2.13?

alrossi commented 8 years ago

https://rb.dcache.org/r/9453/

alrossi commented 5 years ago

Andreas,

I think your problem is independent of the MDC issue. It has to do with the received field being null (the JAVA autoboxing code will throw an NPE when you do something like this:

Integer i = null;

if (i == 0) { ... }

and that is what seems to be the case here. However, I am not sure how this is actually occurring.

What database are you using ... RDBMS or XML?

Thanks, Al