Unidata / LDM

The Unidata Local Data Manager (LDM) system includes network client and server programs designed for event-driven data distribution, and is the fundamental component of the Unidata Internet Data Distribution (IDD) system.
http://www.unidata.ucar.edu/software/ldm
Other
43 stars 27 forks source link

LDM 6.13.16 pqact will stop ingesting with a small product queue #106

Closed sebenste closed 2 years ago

sebenste commented 2 years ago

Hello,

I am ingesting the NOTHER feed on our test machine, which had a product queue size of 3GB. Eventually, over a period of weeks, the products will stop ingesting, while the LDM continues to run. Increasing the queue size to 30 GB solves the issue, but it still shouldn't stop the ingestion of data.

semmerson commented 2 years ago

Anything relevant in the LDM log file(s)?

--Steve

sebenste commented 2 years ago

Sadly, no, unfortunately. All I can say is that when I make my queue small, and ingest large data sets that overflow the queue, the LDM stops ingesting, but not immediately and not at an exact time.

On Tue, Jun 7, 2022 at 12:36 PM Steven Emmerson @.***> wrote:

Anything relevant in the LDM log file(s)?

--Steve

— Reply to this email directly, view it on GitHub https://github.com/Unidata/LDM/issues/106#issuecomment-1148973312, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLWO2LSYORWZBHO4JYD6SLVN6CADANCNFSM5YDO6RLQ . You are receiving this because you authored the thread.Message ID: @.***>

--

Opinions are not of my employer, but all facts are. Gilbert Sebenste Consulting Meteorologist AllisonHouse, LLC

semmerson commented 2 years ago

When you say the LDM stops ingesting, do you mean that it stops inserting new products into the queue or that pqact(1) stops processing products?

--Steve

sebenste commented 2 years ago

I get messages that the feed has not had any data within the last 5 minutes, so it looks like it is not inserting products into the queue.

On Jun 7, 2022, at 2:31 PM, Steven Emmerson @.***> wrote:

 When you say the LDM stops ingesting, do you mean that it stops inserting new products into the queue or that pqact(1) stops processing products?

--Steve — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

sebenste commented 2 years ago

We've made all our queues larger to prevent this from happening again. When it does happen, there are absolutely no log messages that seem out of line on our system or on the LDM. So without documentation, and with an easy workaround, I'm going to close this ticket.

semmerson commented 1 year ago

One possibility is that data products are arriving more than the "maximum latency" parameter after they've been created so the LDM is throwing them away. What is your maximum latency parameter ("regutil | grep latency")? From what upstream hosts are you requesting? What does a notifyme(1) to those upstream hosts for the products in question reveal about the difference between their creation-time and the notifyme(1) timestamps (if the difference is greater than the maximum latency parameter, then they won't be inserted).

--Steve

On Tue, Jun 7, 2022 at 1:36 PM Gilbert Sebenste @.***> wrote:

I get messages that the feed has not had any data within the last 5 minutes, so it looks like it is not inserting products into the queue.

On Jun 7, 2022, at 2:31 PM, Steven Emmerson @.***> wrote:

 When you say the LDM stops ingesting, do you mean that it stops inserting new products into the queue or that pqact(1) stops processing products?

--Steve — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

— Reply to this email directly, view it on GitHub https://github.com/Unidata/LDM/issues/106#issuecomment-1149085087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEVZ7OAUL3OWTKCJYA4VPDVN6QENANCNFSM5YDO6RLQ . You are receiving this because you commented.Message ID: @.***>

sebenste commented 1 year ago

I believe that's what happened. They were arriving 20 or 30 seconds behind, and because my queue was small, it couldn't handle it.