OpenC3 / cosmos

OpenC3 COSMOS
https://openc3.com
Other
103 stars 30 forks source link

intermittent timeout when running build_cmd #1369

Closed 321github123 closed 2 months ago

321github123 commented 2 months ago

First check out our main documentation site at https://openc3.com.

Describe the bug build_cmd times out intermittently

To Reproduce

Run following script repeatedly (manually) in Script Runner

puts 'hi'
build_cmd('INST ABORT')
puts 'bye'

The error does not occur if build_cmd is put into a for loop of 1000 times.

Expected behavior build_cmd not to time out

Screenshots image

Environment (please complete the following information):

jmthomas commented 2 months ago

I'm unable to reproduce on my Macbook. Weird that it's only when running manually and not in a loop. Sort of feels like a startup race condition?

321github123 commented 2 months ago

We can typically only reproduce this bug on clean starts of OpenC3 e.g

./openc3.sh cleanup ./openc3.sh start

If we immediately run "build_cmd", then "build_cmd" times out typically within one of the first ten tries. After we run build_cmd a number of times successfully then it continually succeeds. Is there any sort of caching used by Redis that might be causing this problem?

From: Jason Thomas @.> Sent: Wednesday, July 3, 2024 11:41 AM To: OpenC3/cosmos @.> Cc: Mark Lai @.>; Author @.> Subject: [Ext] - Re: [OpenC3/cosmos] intermittent timeout when running build_cmd (Issue #1369)

CAUTION: Originated outside of IS4S.

I'm unable to reproduce on my Macbook. Weird that it's only when running manually and not in a loop. Sort of feels like a startup race condition?

- Reply to this email directly, view it on GitHubhttps://github.com/OpenC3/cosmos/issues/1369#issuecomment-2206771167, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWNHCGVXZHTDIHYB6XXZAJTZKQSQRAVCNFSM6AAAAABKAIHBSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBWG43TCMJWG4. You are receiving this because you authored the thread.Message ID: @.**@.>>


The information contained in this e-mail and any attachments from Integrated Solutions for Systems may contain confidential and/or proprietary information, and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by return e-mail and permanently delete the e-mail and any attachments.

ryanmelt commented 2 months ago

Are you trying to use build_cmd very quickly before everything is started up? If so, then I would expect to see exactly what you are seeing. build_cmd communicates with target specific backend microservices, and those have to be up and running before it will work otherwise it will get ack timeouts.

321github123 commented 2 months ago

Is there anyway to tell if the target specific microservices are ready to support the build_cmd request?

From: Ryan Melton @.> Sent: Thursday, July 4, 2024 11:53 AM To: OpenC3/cosmos @.> Cc: Mark Lai @.>; Author @.> Subject: [Ext] - Re: [OpenC3/cosmos] intermittent timeout when running build_cmd (Issue #1369)

CAUTION: Originated outside of IS4S.

Are you trying to use build_cmd very quickly before everything is started up? If so, then I would expect to see exactly what you are seeing. build_cmd communicates with target specific backend microservices, and those have to be up and running before it will work otherwise it will get ack timeouts.

- Reply to this email directly, view it on GitHubhttps://github.com/OpenC3/cosmos/issues/1369#issuecomment-2209345947, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWNHCGQM4L6XJ5E6EEZMR7TZKV4WPAVCNFSM6AAAAABKAIHBSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGM2DKOJUG4. You are receiving this because you authored the thread.Message ID: @.***>


The information contained in this e-mail and any attachments from Integrated Solutions for Systems may contain confidential and/or proprietary information, and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by return e-mail and permanently delete the e-mail and any attachments.

ryanmelt commented 2 months ago

I am able to recreate not at startup. I think there is a real issue here.

ryanmelt commented 2 months ago

I see the issue. build_cmd is missing a call to update_topic_offsets before the write. A very fast build_cmd response could be lost. I need to do a general audit to make sure the same error isn't anywhere else, but at least regular cmd() is handling it correctly.

ryanmelt commented 2 months ago

Note that this bug can manifest once per thread in the cmd-tlm-api server. After the first time a thread has done a build_cmd it should always work perfectly.