apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.87k stars 6.52k forks source link

[Feature] Add settings for trace limiting or server side limits #11373

Closed gyrter closed 1 year ago

gyrter commented 1 year ago

Search before asking

Description

Hello there. I'm trying your product with our Magento PHP project and got error with elasticsearh storage. Out traces reach Jackson and Armeria limits.

It is good idea to add trace limiter or something like that.

Use case

Firstly I got this error

com.linecorp.armeria.common.ContentTooLargeException: maxContentLength: 10485760, contentLength: 38388047, transferred: 10488168

I added -Dcom.linecorp.armeria.defaultMaxResponseLength=0 to JAVA_OPTS, but then I got this one

2023-10-03 13:40:33,158 org.apache.skywalking.library.elasticsearch.client.SearchClient 63 [armeria-eventloop-epoll-8-4] ERROR [] - [9.6.0-fb61128] Failed to search, request org.apache.skywalking.library.elasticsearch.requests.search.Search@70d6a005, params org.apache.skywalking.library.elasticsearch.requests.search.SearchParams@22b7fdfe, index [sw_segment-20231003]
java.util.concurrent.CompletionException: com.fasterxml.jackson.databind.JsonMappingException: String length (20059814) exceeds the maximum length (20000000) (through reference chain: org.apache.skywalking.library.elasticsearch.response.search.SearchResponse["hits"]->org.apache.skywalking.library.elasticsearch.response.search.SearchHits["hits"]->java.util.ArrayList[0]->org.apache.skywalking.library.elasticsearch.response.search.SearchHit["_source"])

As I understand I need to change com.fasterxml.jackson.databind.ObjectMapper configuration here

But I think, that this is not right way. I think php agent need trace limit or something like that.

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

wu-sheng commented 1 year ago

Could you explain first why the query result is so huge? What is actually in the trace?

gyrter commented 1 year ago

I will try to found largest doc in index, decode and send you example trace.

Do you mean, that problematic trace came from backend job and it was not API or norman web Render trace?

wu-sheng commented 1 year ago

Do you mean, that problematic trace came from backend job and it was not API or norman web Render trace?

This seems possible. But let's discuss based on the real data.

wu-sheng commented 1 year ago

@gyrter I deleted the message for you. Please don't post your env data in this public channel. It could leak many information you didn't expect. As trace usually includes IP:port and server relationship.

wu-sheng commented 1 year ago

What part do you fail to read? I could guide you to the method, and you are better to decode on your private env only.

gyrter commented 1 year ago

Elastic search returns heavy document. Example query:

curl -X GET "localhost:9200/sw_segment-20230927,sw_segment-20230928,sw_segment-20230929,sw_segment-20230930,sw_segment-20231001,sw_segment-20231002,sw_segment-20231003,sw_segment-20231004/_search?ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true" -H 'Content-Type: application/json' -d'{"from":0,"size":20,"query":{"bool":{"must":[{"range":{"time_bucket":{"gte":20230927000000,"lte":20231004235959}}},{"term":{"service_id":"b2JpX2xvY2Fs.1"}}]}},"sort":[{"start_time":{"order":"desc"}}]}'

And I have problems with responce.

{
  "took": 141,
  "timed_out": false,
  "_shards": {
    "total": 15,
    "successful": 15,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "sw_segment-20231003",
        "_type": "_doc",
        "_id": "33659203614752764205590757610046136365",
        "_score": null,
        "_source": {
          "start_time": 1696338779995,
          "trace_id": "140293663296148403565889061609312210859",
          "data_binary": ....
....

Skywalking server show me error while accessing this documents. I inspected data with jq:

zcat out.json.gz  | jq '.hits.hits[] | ._source.data_binary | length' 
21704012
21704012
21704016

As you can see field data_binary is too large. Jackson library had limit 20000000 for String, but I have 21704016.

gyrter commented 1 year ago

What part do you fail to read? I could guide you to the method, and you are better to decode on your private env only.

How can I decode data_binary field?

wu-sheng commented 1 year ago

You could use SegmentObject segmentObject = SegmentObject.parseFrom(segment.getDataBinary()); to decode it.

Ref from

https://github.com/apache/skywalking/blob/383dd1e9a19d431be18a1eb5ed34b60261b6f278/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/query/TraceQueryService.java#L127

gyrter commented 1 year ago

@gyrter I deleted the message for you. Please don't post your env data in this public channel. It could leak many information you didn't expect. As trace usually includes IP:port and server relationship.

It was from docker compose environment with ephemeral ips and credentials.

gyrter commented 1 year ago

I think it was strange fluctuation. Everything works fine on latest master build.