apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.86k stars 6.52k forks source link

[Significant BUG]Spring Annotation Plugins Makes FULLGC Without Stop #4630

Closed wangdafang closed 4 years ago

wangdafang commented 4 years ago

Environment: 19 agents 1 skywalking-oap-service-6.6.0 1 es-6.6.2.

sw-oap-jvm-param: jvm-param -Xmx12g,-Xms12g,-Xmn6,-XX:SurvivorRatio=8

Desc: 19 agents with optional-spring-plugins - spring-annotional-plugins. When the skywalking-opa-servic begins running,this will happen: the jvm-Eden increasing,this will lead to the jvm-OLD increases.When the jvm-OLD is used by 100%,the process will try fgc for all the time,and will provide no service for agent and ui. At this time,the skywalking-opa-service process is exit,but there is no service provided.

I have found,that when the operateID is null,sw will try save the objs into buffer-file.But when the flows become upper and upper,the excessive objs should be given up.

It bothers me a lot,and I need help

wu-sheng commented 4 years ago

For agent, the description is not clear. Please do some research about which objects cause the memory usage increase. One guess, as you are using annotation plugin, more classes will be generated, Please make sure you have even perm size.

For backend, Id is generated in the register process, if that never successes, It must be something else wrong.

wangdafang commented 4 years ago

For agent, the description is not clear. Please do some research about which objects cause the memory usage increase. One guess, as you are using annotation plugin, more classes will be generated, Please make sure you have even perm size.

For backend, Id is generated in the register process, if that never successes, It must be something else wrong.

[Supplement] Agent version is 6.6.2 When the jvm-OLD is 100%,execute 'jmap -histo pid',and receive num #instances #bytes class name

1: 48297953 5118173992 [C 2: 38587050 2160874800 org.apache.skywalking.apm.network.register.v2.Endpoint 3: 44771449 1074514776 java.lang.String 4: 3246315 761023720 [B 5: 972883 334963928 [Ljava.lang.Object; 6: 622864 236176640 [I 7: 5788779 231551160 sun.nio.cs.UTF_8$Decoder 8: 942953 114192712 [Ljava.util.HashMap$Node; 9: 1904315 91407120 org.apache.skywalking.apm.network.register.v2.EndpointMappingElement 10: 2546854 81499328 java.util.HashMap$Node

execute 'jstat -gcutil pid' S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 100.00 100.00 100.00 97.17 93.43 14 5.029 37 128.189 133.219 0.00 26.51 100.00 100.00 97.17 93.43 14 5.029 37 143.384 148.413 0.00 34.17 100.00 100.00 97.17 93.43 14 5.029 37 143.384 148.413 0.00 35.71 100.00 100.00 97.17 93.43 14 5.029 38 143.384 148.413

execute jps -lmvV pid org.apache.skywalking.oap.server.starter.OAPServerStartUp -Xms12g -Xmn6g -Xmx12g -XX:SurvivorRatio=8 -XX:MetaspaceSize=1024M -XX:MaxMetaspaceSize=1024M -XX:+UseConcMarkSweepGC -XX:CMSMaxAbortablePrecleanTime=5000 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+ExplicitGCInvokesConcurrent -Xloggc:/logs/oap/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Doap.logDir=/sw/apache-skywalking-apm-bin/logs

wu-sheng commented 4 years ago

Ok, endpoint cache issue is caused by your URI including countless parameter, which could also explain the register performance issue.

Which things are included in your uri?

wu-sheng commented 4 years ago

In next 8.x, We fixed this performance issue by removing the operation id register. But even we fixed it, including parameter in the uri will make the metrics of endpoint meaningless.

wu-sheng commented 4 years ago

If you want to understand this more, welcome to join our bi weekly meeting using zoom.

wangdafang commented 4 years ago

Ok, endpoint cache issue is caused by your URI including countless parameter, which could also explain the register performance issue.

Which things are included in your uri?


How can I have a very shot communicate with you to help me to solve my question in my prod env.

wu-sheng commented 4 years ago

How can I have a very shot communicate with you to help me to solve my question in my prod env.

We are not providing 1:1 talk, as we have too many users, we can't afford that. But we have bi-weekly online meeting since next Tuesday.

Join our mail list, Mail to dev-subscribe@skywalking.apache.org, follow the reply to subscribe the mail list. Then send the mail to dev@skywalking.apache.org to apply to attend that meeting. Once we received that, we will reply you the meeting link and password. The time is 830PM - 930 PM, Tuesday night, UTC +8. Friendly to most people.