avioconsulting / mule-opentelemetry-module

Mule Extension to generate OpenTelemetry traces and metrics
https://avioconsulting.github.io/mule-opentelemetry-module/
BSD 2-Clause "Simplified" License
24 stars 8 forks source link

Memory and CPU Issue #197

Open holiday-sunrise opened 3 weeks ago

holiday-sunrise commented 3 weeks ago

We have an API with 100 thousand reguest per 30 min.

With the Module the Memory Consumption 3 time higher and the time per request slows down from 1-4 ms to 10-70.

10 times slower :-(

what are we doing wrong

Mule 4.6.6 Java 8 Rest API as Anti Corruption Layer for LDAP opensuse 15.6 2 CPU max per replica 2.6 GB Memory max per replica

image

manikmagar commented 3 weeks ago

@holiday-sunrise Do you have usage of http:request in those flows? If yes, see if this helps - https://avioconsulting.github.io/mule-opentelemetry-module/#_static_vs_dynamic_global_configurations

holiday-sunrise commented 3 weeks ago

we using the following config

<opentelemetry:config name="OpenTelemetry_Config_Elk" serviceName="${api.name}" spanAllProcessors="true">
        <opentelemetry:resource-attributes>
            <opentelemetry:attribute key="mule.env" value="${mule.env}" />
            <opentelemetry:attribute key="deployment.environment" value="${mule.env}" />
        </opentelemetry:resource-attributes>
        <opentelemetry:exporter>
            <opentelemetry:otlp-exporter collectorEndpoint="${ELASTIC_APM_SERVER_URL}">
                <opentelemetry:headers>
                    <opentelemetry:header key="Authorization" value="Bearer ${ELASTIC_APM_SECRET_TOKEN}" />
                </opentelemetry:headers>
            </opentelemetry:otlp-exporter>
        </opentelemetry:exporter>
    </opentelemetry:config>
holiday-sunrise commented 3 weeks ago

@holiday-sunrise Do you have usage of http:request in those flows? If yes, see if this helps - https://avioconsulting.github.io/mule-opentelemetry-module/#_static_vs_dynamic_global_configurations

is this decreasing the throughput ?

manikmagar commented 3 weeks ago

In high load conditions, it can affect memory and CPU consumption. See the graphs in that performance test (just fixed the links on those images).

From the global configuration you shared, looks like you are already using the static configuration which is better and not affecting adversely.

holiday-sunrise commented 3 weeks ago

But the performance is worse 10 times slower ? I think there is still a config problem. Can we reduse spans? For, setvariable oder cal custom module ?

Did you think that will increase the performance ?

manikmagar commented 2 weeks ago

spanAllProcessors="true" will definitely be very verbose, and that is what causes the creation of all common spans, too, such as for set-variable and others. Any specific reason for that? Reducing the number of spans created can reduce resource consumption. You can either set the spanAllProcessors="false" and validate that you get spans for meaningful processors. Or, you could set it to true if any spans are missing and then configure the "Disable Spans For" section to provide list of processors to exclude from span creation (eg. mule:set-variable).

manikmagar commented 2 weeks ago

Also, check for the static vs. dynamic usage. You have shared OTEL config but not the HTTP request config earlier.

@holiday-sunrise Do you have usage of http:request in those flows? If yes, see if this helps - https://avioconsulting.github.io/mule-opentelemetry-module/#_static_vs_dynamic_global_configurations

holiday-sunrise commented 2 weeks ago

We will check it

holiday-sunrise commented 2 weeks ago

Do you have usage of http:request in those flows? If yes, see if this helps -

Yes we habe but not always sometimes its an custom connector (module)

holiday-sunrise commented 2 weeks ago

(eg. mule:set-variable).

is there a list of all configs like mule:set-variable

Did you have an example

holiday-sunrise commented 1 week ago

@manikmagar any News ? My you help us ?

What is a best practice config with a limit cunsumtion overhead

manikmagar commented 1 week ago

Hi @holiday-sunrise, we are also doing some tests. If you would like to perform the similar test, I suggest to try this -

To use the snapshot version, you would need to add the following repository to the pom file -

<repository>
        <id>oss.sonatype.org-snapshot</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>

If you could please share your observations here, that will help us too. Thank you!

manikmagar commented 1 week ago

In addition to above, also review if you really want spanAllProcessors="true". This will create spans for common processors such as from common namespaces such as core, ee, spring, etc. I suggest to set that to false and see if you miss anything from your traces. More the spans generated, more the resources consumed.

holiday-sunrise commented 1 week ago

Hi @holiday-sunrise, we are also doing some tests. If you would like to perform the similar test, I suggest to try this -

  • Use the latest 2.3.0-SNAPSHOT
  • Disable the Interceptor feature by setting property mule.otel.interceptor.processor.enable=false on runtime
  • Without interceptor this is needed for context propagation, add the following operation at the beginning of the main flow (e.g. After the HTTP-listener in APIKit flow). [If you were using opentelemetry:get-trace-context for anything, replace it with this one]
<opentelemetry:get-current-trace-context doc:name="Get Current Trace Context" config-ref="OpenTelemetry_Config" target="OTEL_TRACE_CONTEXT"/>
  • Verify the traces generated and the performance.

To use the snapshot version, you would need to add the following repository to the pom file -

<repository>
        <id>oss.sonatype.org-snapshot</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>

If you could please share your observations here, that will help us too. Thank you!

In which format do you need the observations

manikmagar commented 1 week ago

At least a summary similar to what you shared in the original question would do. Basically, with those suggested changes, how does your app behave when compared to the original? If there are CPU and Memory graphs, that would be addon. Thank you!