edgexfoundry / app-functions-sdk-go

Owner: Applications WG
Apache License 2.0
43 stars 81 forks source link

[Performance] App service out of memory when the function process is slower than receiving events #1516

Closed cloudxxx8 closed 9 months ago

cloudxxx8 commented 10 months ago

🐞 Bug Report

Affected Services [REQUIRED]

The issue is located in: App Service ### Is this a regression? No ### Description and Minimal Reproduction [**REQUIRED**] Currently, App Service requires the data export speed is faster than the data receiving speed. If the user sets up lots of devices with lots of autoevents, for example 100 events per second send to the EdgeX message bus, and use the HTTP export to the external web server in App Service, the events will be queued in the service due to the performance limitation of HTTP export. Even using the MQTT export, if the network is disconnected a couple hours, the messages (gorutines) will be accumulated very quick, because the service keep reconnect and the reconnect function contains LOCK https://github.com/edgexfoundry/app-functions-sdk-go/blob/0026a7d191b2c7a8bb217d1e10d99788b36c1b78/pkg/transforms/mqttsecret.go#L222-L223 https://github.com/edgexfoundry/app-functions-sdk-go/blob/0026a7d191b2c7a8bb217d1e10d99788b36c1b78/pkg/transforms/mqttsecret.go#L156-L174 If the `Store and Forward` feature is enabled, the situation will be worse. There will be more gorutines caused by resend and the events are stored in the memory database (Redis). We need a better event handling in the function pipeline. ## πŸ”₯ Exception or Error The memory usage will keep growing until the service becomes our of memory or system crashes. ## 🌍 Your Environment **Deployment Environment:** docker **EdgeX Version [**REQUIRED**]:** v3.1
lenny-goodell commented 9 months ago

@cloudxxx8 , can you join the next App Services WG meeting on Dec 11th at 3:30 AZT (2:30 PST) to discuss this issue? I think that is 6:30am for you.

cloudxxx8 commented 9 months ago

@lenny-intel sorry, I can't join the meeting. I normally work late, so it's very hard to wake up so early with a fresh brain. May we discuss it in the Core/QA meeting?

lenny-goodell commented 9 months ago

@lenny-intel sorry, I can't join the meeting. I normally work late, so it's very hard to wake up so early with a fresh brain. May we discuss it in the Core/QA meeting?

Yes, that will work

lenny-goodell commented 9 months ago

After discussion in Core WG, we decided to address this in two ways.

  1. Move the initial explicit broker connection to when the function is instantiated and rely on mqtt client's reconnect capability and failed export if sender.client.IsConnected() is false. This will have side effect of service failing to start-up if initial connection fails after some amount of retries.
  2. Enhance Store and Forward to trigger retries after success exports. See #1526