apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.72k stars 6.5k forks source link

skywalking agent 6.2.0 jvm memory footprint is huge and growing #3293

Closed david1228 closed 5 years ago

david1228 commented 5 years ago

skywalking agent 6.2.0 jvm memory footprint is huge and growing

Jmap monitor object occupancy:

num     #instances         #bytes  class name
----------------------------------------------
   1:       9050602     1643057880  [C
   2:       3818194      459214360  [B
   3:       8140462      195371088  java.lang.String
   4:       3000237      168013272  org.apache.skywalking.apm.network.register.v2.Endpoint
   5:       1320025      165971680  [I
   6:       4834363      154699616  java.util.concurrent.ConcurrentHashMap$Node
   7:       4658610      149075520  org.apache.skywalking.apm.agent.core.dictionary.EndpointNameDictionary$OperationNameKey
   8:       3000237       72005688  org.apache.skywalking.apm.dependencies.com.google.protobuf.ByteString$LiteralByteString
   9:       1647448       68221432  [Ljava.lang.Object;
  10:          4643       43759112  [Ljava.util.concurrent.ConcurrentHashMap$Node;
  11:        934785       37391400  java.util.LinkedHashMap$Entry
  12:       1026014       32832448  java.util.LinkedHashMap$LinkedKeyIterator
  13:       1761775       28188400  java.lang.reflect.WeakCache$LookupValue
  14:        200596       18161200  [Ljava.util.HashMap$Node;

The class org.apache.skywalking.apm.agent.core.dictionary.EndpointNameDictionary takes up very large memory, is it a bug? skywalking agent version is 6.2.0 .

Eclipse mat tool analysis result: image

Look at the contents of ConcurrentHashMap and find that it may be caused by a placeholder api image

kezhenxu94 commented 5 years ago

@david1228 Can you provide more detail? Do you really have such many endpoints in the application, if not, what does the endpoint look like, are there variables in the endpoint name?

kezhenxu94 commented 5 years ago

@david1228 please add new comment instead of updating the original comment, otherwise we won't get notified. what's the framework are you using?

kezhenxu94 commented 5 years ago

if it's Spring MVC and the endpoint is /api/checkTicket/tk/{ticketNumber}, they should be merged, see #2676 , #3084

wu-sheng commented 5 years ago

Dictionary#ENDPOINT_NAME_BUFFER_SIZE default way is 1000 * 10000 for keeping things safe. If you have tons of endpoint, which I think it is impossible, you could make it smaller.

But at the same time, this number of endpoint could be an issue to backend too. Make too many unnecessary payloads to backend cache. We set 100000 as the max cache size, more than that, you will make the payload to ElasticSearch directly.

@david1228 Please try to find out why you have so many endpoints, this should not happen.

@kezhenxu94 I am going to open configurable to the OAP cache size, which is not related to this, I just don't want it to be static config.

david1228 commented 5 years ago

@david1228 please add new comment instead of updating the original comment, otherwise we won't get notified. what's the framework are you using?

Ok, Our own application uses springcloud framework, but /api/checkTicket/tk/{userToken} third part api called by spring web restTemplate, the {userToken} parameter is different for each user.

wu-sheng commented 5 years ago

Ok, Our own application uses springcloud framework, but /api/checkTicket/tk/{userToken} third part api called by spring web restTemplate, the {userToken} parameter is different for each user.

This would be an issue, if we add pattern match in the core, it gives extra CPU payloads. Maybe we should consider adding an optional API to the plugin? @kezhenxu94 @zhaoyuguang @IanCao @candyleer @JaredTan95 What do you think?

zhaoyuguang commented 5 years ago

Ok, Our own application uses springcloud framework, but /api/checkTicket/tk/{userToken} third part api called by spring web restTemplate, the {userToken} parameter is different for each user.

This would be an issue, if we add pattern match in the core, it gives extra CPU payloads. Maybe we should consider adding an optional API to the plugin? @kezhenxu94 @zhaoyuguang @IanCao @candyleer @JaredTan95 What do you think?

The matching rule needs to be compatible with this case: such as http server. When the request resource does not exist, like HTTPCOE 404, and the operationName needs to be classified as a kind.

wu-sheng commented 5 years ago

The matching rule needs to be compatible with this case: such as http server. When the request resource does not exist, like HTTPCOE 404, and the operationName needs to be classified as a kind.

This looks like operation override, this should have been there, right? We have span#setOperationName. Let me start a new issue to discuss.

wu-sheng commented 5 years ago

Provide a feature pull request at #3299. Then you could group the URLs with given rules.