When ingesting large volume feeds, product queue becomes way too large

Unidata / LDM

The Unidata Local Data Manager (LDM) system includes network client and server programs designed for event-driven data distribution, and is the fundamental component of the Unidata Internet Data Distribution (IDD) system.

http://www.unidata.ucar.edu/software/ldm

Other

43 stars 27 forks source link

When ingesting large volume feeds, product queue becomes way too large #126

Closed sebenste closed 6 months ago

sebenste commented 8 months ago

Version: LDM 6.15.0.9 and earlier OS: CentOS 7, 8 and Ubuntu 22.0.4.3 LTE

If one is receiving a large data feed or feeds such as CONDUIT and NEXRAD2, if the queue is too small, the LDM will enlarge the product queue to an absurdly large size when the reconciliation mode in registry.xml is set to "increase queue". This lead to a 2 TB queue size that filled up a server and caused it to crash. There was no need for a queue that large. The program that calculates the anticipated size of the queue is at fault.

semmerson commented 8 months ago

@sebenste The algorithm is ok for incremental changes to the size, but it can do what you say if the jump is large. My advice is always to treat "increase queue" like cruise control on a car. Get to where you want manually and then engage.

sebenste commented 8 months ago

but for those who are unaware, this could cause a dangerous issue in an operational environment. Can you put a "throttle" on it so it doesn't go crazy like that? Or modify the algorithm?

semmerson commented 8 months ago

It's in the documentation -- but I'll see what can be done.

sebenste commented 8 months ago

Maybe cap it at 100 GB? But then 10 years from now, that's too small. If nothing else, I would really highlight it in the docs.

semmerson commented 7 months ago

@sebenste was the LDM offline for a while before this happened?

sebenste commented 7 months ago

By the time I got to it, the server had crashed in one instance, and the drive was filling up in another. In that second instance, data was still coming in.

semmerson commented 7 months ago

No. I mean was the LDM offline for a while, then brought back online, and then it crashed?

sebenste commented 7 months ago

No, it was online until it crashed. A reboot brought the queue size to a respectable level, before I changed the parameter to reduce the queue size.

sebenste commented 6 months ago

This has been solved in LDM version 6.15.0. Thanks, Steve!

sebenste commented 6 months ago

Closed.