GeekyDeaks / discord-destinybot

Discord Destiny Bot
MIT License
8 stars 3 forks source link

bot hangs when using discord.js v9 and running under pm2 #28

Closed GeekyDeaks closed 7 years ago

GeekyDeaks commented 7 years ago

Odd problem - migrated to discord.js v9 and everything was looking good on my dev box (OSX). Moved it to the staging environment (linux / pm2) and it repeated hangs after a few commands. Unfortunately it does not just crash, so pm2 leaves the process in a brain dead state. I can also replicate this on my OSX machine by running the bot under pm2 and sending a few commands.

Fun facts:

  1. cannot replicate outside pm2
  2. cannot replicate with discord.js v8

I attached lldb to the process once it had hung and it seems to be spinning on a lock. When taking a look under truss I noticed that all the V8 worker threads were waiting for the same semaphore. Here is the state of all the threads from lldb:

Process 31638 stopped
* thread #1: tid = 0x2d55f6, 0x000016ba8f92e26f, name = 'node /Users/deakins/Dropbox/dev/discord-destinybot/destinybot.j', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0x2d55f7, 0x00007fff9285f51a libsystem_kernel.dylib`semaphore_wait_trap + 10
  thread #3: tid = 0x2d55f8, 0x00007fff9285f51a libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'V8 WorkerThread'
  thread #4: tid = 0x2d55f9, 0x00007fff9285f51a libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'V8 WorkerThread'
  thread #5: tid = 0x2d55fa, 0x00007fff9285f51a libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'V8 WorkerThread'
  thread #6: tid = 0x2d55fb, 0x00007fff9285f51a libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'V8 WorkerThread'
  thread #7: tid = 0x2d560c, 0x00007fff92864136 libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #8: tid = 0x2d560d, 0x00007fff92864136 libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #9: tid = 0x2d560e, 0x00007fff92864136 libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #10: tid = 0x2d560f, 0x00007fff92864136 libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #11: tid = 0x2d5611, 0x00007fff92865232 libsystem_kernel.dylib`kevent64 + 10, queue = 'com.apple.libdispatch-manager'

Looks suspiciously like a classic deadlock.

I am not familiar with the OSX data structures, so I am going to replicate the environment on a linux machine I can use gdb to figure out which thread had the lock and hopefully how it got there.

unisys12 commented 7 years ago

As I have mentioned in our previous conversations, I looked seriously at PM2, but have since moved away from implementing it. It very well may be (given that I have not had a chance to take a look at this, I could have no idea what the hades I am talking about, but...) you might could move away from PM2 if everything works ok without it.

GeekyDeaks commented 7 years ago

Yeah - I think you are absolutely right, but I am a pedantic ass and it's bugging me :) - mostly because I don't actually think it's a PM2 problem per se.

I tried replicating it on a little RPi, but for some reason it's not failing. I think I'm going to have to create a debug environment on intel to dig further.

GeekyDeaks commented 7 years ago

ok - I can get it to hang without pm2 in node v6.6.0. I occurs immediately after running the !advisor command.

unisys12 commented 7 years ago

You know I posted something in your chat the other day about having some issues and if I remember correctly, the advisor cmd was what I wanted to checkout.

GeekyDeaks commented 7 years ago

yeah - I forgot about that - too much stuff going on :)

Figured it out. It's a bug in the fancy (i.e. unnecessarily complex) message building code. It does not handle blank messages correctly, so when the trials part returns nothing it causes it to go into an infinite loop.