department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 197 forks source link

DR | "Stack level too deep" error #73584

Closed data-doge closed 1 month ago

data-doge commented 7 months ago

In Dec. 2023, we began to receive "SystemStackError: stack level too deep" errors on the Supplemental Claims form. See Sentry. This error is thrown, synchronously, at some during the creation/submission of a Supplemental Claim. Which makes it a submission-blocking error, but not a silent error. Interestingly, the error we see here is the result of our error handling code erroring. So we are unable to see the underlying reason for these errors. There is some discussion of this error here: https://dsva.slack.com/archives/C05FCEH2NRG/p1704901615183509.

An update (Feb 13), this error no longer appears to be occurring.

Another update: this error does appear to be occurring again.

Update 4/17/24: Refer to Derek's comment below for more details, but it looks like it's also happening in HLR

Tasks

anniebtran commented 6 months ago

It looks like we're seeing the error again β€” this specific veteran was able to submit successfully though (selected no evidence as a part of their form, so no evidence lost πŸ‘ )

anniebtran commented 6 months ago

Another instance of it

dfong-adh commented 4 months ago

Recent occurrence

anniebtran commented 2 months ago

Seems like sometimes we don't get alerted by our monitors about this error occurring 😳 These two errors are in our dashboard but were not in our notifications channel: trace 1 and trace 2

eileen-coforma commented 1 month ago

Investigation in the works

anniebtran commented 1 month ago

Investigation notes:

Suggestions/possible next steps:

irb(main):019> PersonalInformationLog.where('error_class LIKE ?', "%V1::SupplementalClaimsController%").order(created_at: :desc).count
=> 1
eileen-coforma commented 1 month ago

In code review. Poking platform.

anniebtran commented 1 month ago

PR was merged, working on validating but may take a few days as we don't see this error too frequently

anniebtran commented 1 month ago

Saw an instance of an error that leads to the stack level too deep:

Exception occurred while submitting Supplemental Claim: 1 validation error detected: Value at 'plaintext' failed to satisfy constraint: Member must have length less than or equal to 4096

Not sure what this means, but it's a start

anniebtran commented 1 month ago

Going to add the extra logging to NOD and HLR too since we saw a couple of stack level too deep errors for those forms as well 😩

Mottie commented 1 month ago

May open a new ticket to investigate errors from the logs

anniebtran commented 1 month ago

May have figured out what is possibly causing the stack level too deep error in Supplemental Claims, which is the form that most frequently encounters it, so I'll make a ticket to resolve it. But not sure what have caused it in NOD and HLR, but we can monitor for it now that we've added the extra logging

anniebtran commented 1 month ago

Haven't yet encountered the stack level too deep error from NOD and HLR since we've added the logging but upon inspecting their traces, they both happened around the same time and both share this error:

PG::UndefinedColumn: ERROR:  column form_submissions.in_progress_form_id does not exist
LINE 1: ...submissions" SET "in_progress_form_id" = $1 WHERE "form_subm...
                                                             ^

/usr/local/bundle/cache/ruby/3.3.0/gems/ddtrace-1.23.2/lib/datadog/tracing/contrib/pg/instrumentation.rb:35:in `exec_params': ERROR:  column form_submissions.in_progress_form_id does not exist
LINE 1: ...submissions" SET "in_progress_form_id" = $1 WHERE "form_subm...
                                                             ^
 (PG::UndefinedColumn)
    from /usr/local/bundle/cache/ruby/3.3.0/gems/ddtrace-1.23.2/lib/datadog/tracing/contrib/pg/instrumentation.rb:35:in `block in exec_params'
    ...

So going to look into this a bit and see if there are any other instances of this error πŸ‘€

anniebtran commented 1 month ago

Gonna close this since there's a new ticket to cover the fix for SC and we haven't seen the stack level too deep error in HLR and NOD occur again with the new logging (yet) β€” if we do encounter the error for those forms in the future, we can create a ticket as we'll be able to find more details around what may be causing it for them.