Own-and-Ship / oas_agent

The Own & Ship Ruby agent
2 stars 0 forks source link

TypeError in event cache #3

Closed caius closed 2 weeks ago

caius commented 1 year ago

Observed from a cap env deploy run, the reporter thread terminated with exception as the process booted which then caused it to exit with an error after Capistrano finished up successfully.

Output from the start of the process, which has caller 4 missing for some reason

** [Own & Ship][2023-08-16 13:43:53 +0000 ci-vm (4909)] INFO : Reading configuration from /$app/config/own_and_ship.yml
** [Own & Ship][2023-08-16 13:43:53 +0000 ci-vm (4909)] INFO : The Own & Ship agent in the development environment for application $app - (Development) will be disabled
#<Thread:0x0000558083080260 /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:38 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
    6: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `block in create_reporter_thread' 
    5: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `loop'

    3: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:66:in `receive_reports_from_queue'
    2: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/events_cache.rb:18:in `add_event'
    1: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `hash_for_event_data'
/$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `+': no implicit conversion of nil into String (TypeError)

Then backtrace reported at the end of the process had line 4 in, oddly.

Traceback (most recent call last):
    6: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `block in create_reporter_thread'
    5: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `loop'
    4: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:41:in `block (2 levels) in create_reporter_thread'
    3: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:66:in `receive_reports_from_queue'
    2: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/events_cache.rb:18:in `add_event' 
    1: from /$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `hash_for_event_data'
1336/$app/vendor/bundle/ruby/2.7.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `+': no implicit conversion of nil into String (TypeError)
caius commented 1 year ago

Observed this twice more, when attempting to upgrade from commit 4e5a147 to commit 9287b63. Unsure if the upgrade is unrelated, need to test again.

caius commented 9 months ago

Observed it again yesterday, when running a Capistrano deploy from CI. Blew up in the migrate task this time. I wonder if it's something to do with running under Capistrano, possibly the software name is nil rather than rails 🤔

The cap log shows oas_agent not being enabled in the current environment as the process starts, then raises the TypeError from the background thread after a migration has completed. This isn't the first task run on the server(s), so it's likely being triggered by a warning from our migration task (glorious.) Consistently failed on retries as well, so it's following a reproducible code path I think.

Relevant snippets from the log, CI was running bundle exec capistrano stage deploy:

/usr/lib/rbenv/bin/rbenv exec bundle exec rake db:migrate:with_data
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
Top level ::CompositeIO is deprecated, require 'multipart/post' and use `Multipart::Post::CompositeReadIO` instead!
Top level ::Parts is deprecated, require 'multipart/post' and use `Multipart::Post::Parts` instead!
** [Own & Ship][2023-11-29 22:25:47 +0000 job-1 (356555)] INFO : Reading configuration from /app/releases/20231129222514/config/own_and_ship.yml
** [Own & Ship][2023-11-29 22:25:47 +0000 job-1 (356555)] INFO : Reading configuration from /app/releases/20231129222514/config/own_and_ship.yml
** [Own & Ship][2023-11-29 22:25:47 +0000 job-1 (356555)] INFO : Unknown alias: default_settings
** [Own & Ship][2023-11-29 22:25:47 +0000 job-1 (356555)] INFO : The Own & Ship agent in the stage environment for application  will be disabled
#<Thread:0x00007f76aa97e6e0 /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:38 run> terminated with exception (report_on_exception is true):
/app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `+': no implicit conversion of nil into String (TypeError)
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `hash_for_event_data'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/events_cache.rb:18:in `add_event'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:66:in `receive_reports_from_queue'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:41:in `block (2 levels) in create_reporter_thread'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `loop'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `block in create_reporter_thread'
== Data =======================================================================
== 20231117124215 UpdateIds: migrating ===============================
== 20231117124215 UpdateIds: migrated (15.4081s) =====================

/app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `+': no implicit conversion of nil into String (TypeError)
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/event_cache.rb:39:in `hash_for_event_data'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/events_cache.rb:18:in `add_event'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:66:in `receive_reports_from_queue'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:41:in `block (2 levels) in create_reporter_thread'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `loop'
    from /app/shared/bundle/ruby/3.1.0/bundler/gems/oas_agent-4e5a14754bcd/lib/agent/reporter.rb:39:in `block in create_reporter_thread'
#<Thread:0x00007fec0be43820 /home/semaphore/api/vendor/bundle/ruby/3.1.0/gems/sshkit-1.21.3/lib/sshkit/runners/parallel.rb:10 run> terminated with exception (report_on_exception is true):
/home/semaphore/api/vendor/bundle/ruby/3.1.0/gems/sshkit-1.21.3/lib/sshkit/runners/parallel.rb:15:in `rescue in block (2 levels) in execute': Exception while executing as USER@job-1: rake exit status: 1 (4mSSHKit::Runner::ExecuteError)
caius commented 9 months ago

Re-running the above deploy with debugging code added (dumping the arguments to EventCache. hash_for_event_data) was successful, I suspect because the migration had already run to completion so we didn't have a migration to run.

caius commented 2 weeks ago

We fixed this.