avioconsulting / mule-opentelemetry-module

Mule Extension to generate OpenTelemetry traces and metrics
https://avioconsulting.github.io/mule-opentelemetry-module/
BSD 2-Clause "Simplified" License
23 stars 8 forks source link

NPE - Error in handling processor end event #155

Closed MitchSpector closed 5 months ago

MitchSpector commented 6 months ago

I'm using 1.4.0, and receiving exception: ERROR 2024-03-06 09:49:02,551 [[MuleRuntime].uber.08: [***].validateMessageFlow.BLOCKING @1fedfa56] [processor: **/processors/4/route/18/processors/5/route/0/processors/0/route/0/processors/0; event: aa65bb41-dbc8-11ee-aaec-3c22fb8729d8] com.avioconsulting.mule.opentelemetry.internal.processor.MuleNotificationProcessor: Error in handling processor end event java.lang.NullPointerException: null at com.avioconsulting.mule.opentelemetry.internal.store.FlowSpan.endProcessorSpan(FlowSpan.java:65) ~[?:?] at com.avioconsulting.mule.opentelemetry.internal.store.InMemoryTransactionStore.endProcessorSpan(InMemoryTransactionStore.java:177) ~[?:?] at com.avioconsulting.mule.opentelemetry.internal.connection.OpenTelemetryConnection.endProcessorSpan(OpenTelemetryConnection.java:310) ~[?:?] at com.avioconsulting.mule.opentelemetry.internal.processor.MuleNotificationProcessor.handleProcessorEndEvent(MuleNotificationProcessor.java:192) ~[?:?] at com.avioconsulting.mule.opentelemetry.internal.notifications.listeners.MuleMessageProcessorNotificationListener.onNotification(MuleMessageProcessorNotificationListener.java:31) ~[?:?] at com.avioconsulting.mule.opentelemetry.internal.notifications.listeners.MuleMessageProcessorNotificationListener.onNotification(MuleMessageProcessorNotificationListener.java:9) ~[?:?] at org.mule.runtime.core.api.context.notification.ServerNotificationManager.lambda$fireNotification$0(ServerNotificationManager.java:198) ~[mule-core-4.4.0-20240122.jar:4.4.0-20240122]

I have also attempted use of 1.5.0 with the same results.

OT config is fairly vanilla:

<opentelemetry:config name="OpenTelemetry_Config" doc:name="OpenTelemetry Config" doc:id="57e9b8cd-d272-4bb4-88de-e7b9ab9f7e8d" serviceName="${mule.component}-${env}">

    <opentelemetry:resource-attributes>
        <opentelemetry:attribute key="cms.system"
            value="CMS" />
        <opentelemetry:attribute key="cms.component"
            value="${mule.component}" />
        <opentelemetry:attribute key="cms.env"
            value="${env}" />
        <opentelemetry:attribute key="cms.mule.port"
            value="${http.rest.port}" />
    </opentelemetry:resource-attributes>
    <opentelemetry:exporter >
        <opentelemetry:otlp-exporter collectorEndpoint="${otlp.collector.url}" >
            <opentelemetry:headers >
                <opentelemetry:header key="api-key" value="${NEW_RELIC_LICENSE_KEY}" />
            </opentelemetry:headers>
        </opentelemetry:otlp-exporter>
    </opentelemetry:exporter>
</opentelemetry:config>
manikmagar commented 5 months ago

Hi @MitchSpector, thank you for reporting this and providing those configuration details.

Is it possible to provide information about which processor is this - ******/processors/4/route/18/processors/5/route/0/processors/0/route/0/processors/0; and if there is an identical processor in that whole processor chain?

MitchSpector commented 5 months ago

Hi Manik

That processor is a dataweave message processor with a very simple payload json to pojo transformation/. Recognize that we were seeing the NPE across at least 4 flows.

As I was saying before I reordered the dependencies forcing this component to the end of the dependency chain, and the problem ceased to occur.

We are now facing another issue. I’ll explain it here, and if you thing this is something to dig into I will open an issue.

I have a cron scheduler flow that runs every 10 seconds. The scheduler is set up is with disallowConcurrentExecution=true. The flow calls a db stored proc. If the stored proc returns a resultSet, then it loops thru it doing work. When openTelemetry is disabled – everything works – and has for many years. With an enabled openTelemetry collector, I see it running maybe about 10-12 times, then no longer executes, and produces a INFO log message:

INFO 2024-03-14 08:10:02,580 [[MuleRuntime].uber.14: @.**@.:223> @30132dd8] [processor: ; event: ] org.mule.runtime.core.internal.source.scheduler.DefaultSchedulerMessageSource: Flow '**' is already running and 'disallowConcurrentExecution' is set to 'true'. Execution skipped.

log message right before this is:

DEBUG 2024-03-14 08:09:52,834 [[MuleRuntime].uber.12: [*].jpmcReceiptRouter.CPU_LITE @7721062a] [processor: **Router/processors/2; event: c35da840-e1fb-11ee-92fe-3c22fb8729d8] com.avioconsulting.mule.opentelemetry.internal.connection.OpenTelemetryConnection: Created trace context '{traceId=2f07ab120d80384bef2d4c95e56108b3, spanId=8d0fa5e5df10ed45, spanIdLong=10164525290707021125, traceparent=00-2f07ab120d80384bef2d4c95e56108b3-8d0fa5e5df10ed45-01, TRACE_TRANSACTION_ID=c35da840-e1fb-11ee-92fe-3c22fb8729d8, traceIdLongLowPart=17234515555698935987}' for TRACE_TRANSACTION_ID=c35da840-e1fb-11ee-92fe-3c22fb8729d8, Component Location 'DefaultComponentLocation{name='**', parts=[DefaultLocationPart{partPath='*', partIdentifier=DefaultTypedComponentIdentifier{identifier=flow, type=FLOW}, fileName='*.xml', lineInFile=195, startColumn=2}, DefaultLocationPart{partPath='processors', partIdentifier=null, fileName='null', lineInFile=-1, startColumn=-1}, DefaultLocationPart{partPath='2', partIdentifier=DefaultTypedComponentIdentifier{identifier=db:stored-procedure, type=OPERATION}, fileName='**.xml', lineInFile=203, startColumn=3}], location='*/processors/2'}'

I have this pattern in three pollers, once one of them begins to fail ALL of them cease operations with the same results.

I have also now tried 2.0.0-M3 with the same results.

Also note: I have masked (*****) elements of the log messages to protect security concerns.

v/R,

Mitch Spector, Technical Lead CGI Federal | CMS Project | USCIS OIT DHS Mobile: *..** *@*.**@*.***> [CGI_Logo2012_color][signature_3683647679]

From: Manik Magar @.> Date: Wednesday, March 13, 2024 at 11:34 PM To: avioconsulting/mule-opentelemetry-module @.> Cc: Spector, Jonathon M (Mitch) (CTR) @.>, Mention @.> Subject: Re: [avioconsulting/mule-opentelemetry-module] NPE - Error in handling processor end event (Issue #155) You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification CAUTION: This email originated from outside of the Federal Government. DO NOT click links or open attachments unless you recognize and/or trust the sender. Contact the USCIS Security Operations Center with questions or click the “Report Phishing” button to report it as a phishing attempt.

Hi @MitchSpectorhttps://github.com/MitchSpector, thank you for reporting this and providing those configuration details.

Is it possible to provide information about which processor is this - **/processors/4/route/18/processors/5/route/0/processors/0/route/0/processors/0; and if there is an identical processor in that whole processor chain?

— Reply to this email directly, view it on GitHubhttps://github.com/avioconsulting/mule-opentelemetry-module/issues/155#issuecomment-1996326613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGXRL7PD6ZWFEHUTU3G4PQLYYELE3AVCNFSM6AAAAABEJHDEBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGMZDMNRRGM. You are receiving this because you were mentioned.Message ID: @.***>

manikmagar commented 5 months ago

Hi @MitchSpector, Thank you for additional details.

Regarding the NPE in this issue - /processors/4/route/18/processors/5/route/0/processors/0/route/0/processors/0. All those processors before /0 are either route scopes or flow/sub-flows. Is any of that a for loop?

I think that the second issue is different than this NPE. Would you mind creating a new one for that? Also, if you could how you are looping over resultset there - foreach, foreach-parallel, other ways, and number of items so we know the kind of processors involved?

MitchSpector commented 5 months ago

Manik – your email host rejected the zip. How can I get this to you?

v/R,

Mitch Spector, Program Technical Lead – Solution Architect CGI Federal | RMSS – Records Management | USCIS OIT DHS Mobile: 202.227.0994 @.**@.>

From: Spector, Jonathon M (Mitch) (CTR) @.> Date: Friday, March 15, 2024 at 8:36 AM To: avioconsulting/mule-opentelemetry-module @.>, avioconsulting/mule-opentelemetry-module @.> Cc: Mention @.> Subject: Re: [avioconsulting/mule-opentelemetry-module] NPE - Error in handling processor end event (Issue #155) Hi Manik – Thanks for the quick response.

I want to make this a little easier for us to work together. For obvious reasons, I can’t actually share the application – so I created a basic analog without any exposure.

This basic process shows the NPE. The analog has a for each parallel and db component, however the NPE still occurs even if I take these message processors out of the project.

v/R,

Mitch Spector, Technical Lead CGI Federal | CMS Project | USCIS OIT DHS Mobile: 202.227.0994 @.**@.> [CGI_Logo2012_color][signature_42719351]

From: Manik Magar @.> Date: Thursday, March 14, 2024 at 8:40 AM To: avioconsulting/mule-opentelemetry-module @.> Cc: Spector, Jonathon M (Mitch) (CTR) @.>, Mention @.> Subject: Re: [avioconsulting/mule-opentelemetry-module] NPE - Error in handling processor end event (Issue #155) You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification CAUTION: This email originated from outside of the Federal Government. DO NOT click links or open attachments unless you recognize and/or trust the sender. Contact the USCIS Security Operations Center with questions or click the “Report Phishing” button to report it as a phishing attempt.

Hi @MitchSpectorhttps://github.com/MitchSpector, Thank you for additional details.

Regarding the NPE in this issue - /processors/4/route/18/processors/5/route/0/processors/0/route/0/processors/0. All those processors before /0 are either route scopes or flow/sub-flows. Is any of that a for loop?

I think that the second issue is different than this NPE. Would you mind creating a new one for that? Also, if you could how you are looping over resultset there - foreach, foreach-parallel, other ways, and number of items so we know the kind of processors involved?

— Reply to this email directly, view it on GitHubhttps://github.com/avioconsulting/mule-opentelemetry-module/issues/155#issuecomment-1997366693, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGXRL7OYFSTZ3E4WYYKRG5LYYGLE3AVCNFSM6AAAAABEJHDEBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGM3DMNRZGM. You are receiving this because you were mentioned.Message ID: @.***>

MitchSpector commented 5 months ago

attempted to load example project here: File size too big: 25 MB are allowed, 26 MB were attempted to upload.

MitchSpector commented 5 months ago

pom.xml.zip

MitchSpector commented 5 months ago

Added the sample XML and the POM above for the example.

MitchSpector commented 5 months ago

going on vacation and will be back on 3/28