Network errors cause client to crash

discordjs / discord.js

A powerful JavaScript library for interacting with the Discord API

https://discord.js.org

Apache License 2.0

25.4k stars 3.97k forks source link

Network errors cause client to crash #2879

Closed ghost closed 6 years ago

ghost commented 6 years ago

This is a design and philosophy issue with discord.js that frankly makes me look for other libs that don't have this problem.

Throwing errors in javascript is equivalent of fatal errors in other languages, and as it stands that means any little error, such as Error: read ECONNRESET happening every 2 days to a week means your discord bot will crash, without a line number reported in your code, and with a very general error that doesn't aide in debugging at all. You only get this error if you do: client.on('error', console.error);

General bot examples don't have this line, and others have reported this line not working for them to resolve this issue: Error [ERR_UNHANDLED_ERROR]: Unhandled error. ([object Object]) at Client.emit (events.js:171:17)

This has been brought up in "many duplicate tickets" on github for discord.js and the response from devs seems to be to shift the blame onto end programmers for their bad design choices. The result is a library that makes bots that crash randomly, with error messages that aren't at all helpful. What if PHP curl crashed your entire web application every time there was a network error? It's just totally bad design and you can try to hide behind "but programmers need to handle errors", but even web browsers that rely on javascript don't crash a page and try to have graceful errors as MUCH as possible. The design here is opposite of that, if any little thing goes wrong, better have wrapped the entire thing in an error handler. client.on('error') doesn't even make sense as an event.

It makes you sound like you want try/catch blocks on every line or wrapping an entire program, which is silly in itself. Considering the fact your OWN examples in your documentation do not use this line, and also do not wrap any try/catch around the client creation, or have then/catch and would be susceptible to the exact same problems that keep getting reported and blamed on the programmer who brought it up, then I think it's kind of check and mate as far as this being just poor design, or a bug goes. Take your pick. Because I'm not for using libs that make my bots crash randomly, if you guys aren't going to fix it (as others have reported even this esoteric line not making it work for them), then I'll find a lib that doesn't have a long standing issue purposefully ignored by the devs.

Try catch blocks are a pitfall if you don't use them right, and they should be rare and only for fatal errors in javascript because unlike more graceful langauges that emit notices and warnings more often, javascript tends to just die, and itself doesn't even have a die function and people use throw error as a substitute for it because it's so great at killing your program! A notice or warning of a connection reset would be far more fitting than crashing my bot if I don't have a catch all error handler for the client, which any lib should actually have built into the client itself and allow overriding if the dev chooses, not default to blowing up your application/bot.

Gawdl3y commented 6 years ago

As already explained on numerous occasions, this is standard behaviour for a Node.js application utilising EventEmitters (see the docs here). But no, since this behaviour clearly infuriates you so much, go ahead and switch to Eris (the only other actively-developed JavaScript DAPI lib at this point). Oh wait...

That's okay though, you can obviously just switch to an entirely different language to get away from the undesirable fatal errors/exceptions... No, damn, that won't work either. Java, C#, C++, Rust, Python, Go, Ruby and Dart all treat exceptions/errors as fatal by default. But it's okay, at least if you stick with PHP it won't crash when an error happens! ... Uh-oh, PHP does, however, issue fatal errors for uncaught exceptions. What a shame. It's almost as if the entire community of programmers have collectively decided that suppressing errors by default is very poor behaviour - they're clearly all out to get you!

But hey, it'll all be okay! Just like you wanted, JavaScript/Node.js actually has a feature to allow you to catch these error events and not treat them as fatal! Would you like to know how it's done? Here it is:

client.on('error', console.error);

Phew, that's a doozy. It's a good thing it's written out for you, though, so your fingers don't get hurt. And just in case you do want the errors to be fatal (as many people do, since it's often safer to bail and start all over - see PM2 to assist with this), but still log good information about them, here's another incredibly complex snippet of code for your shameless enjoyment:

client.on('error', err => {
  console.error(err);
  process.exit(1);
});

This "esoteric line" is not spelled out for you in the discord.js documentation because they're not Node.js tutorials. You are expected to understand how error events and uncaught exceptions behave in this environment. You should also know how to read and comprehend error messages and stack traces, especially considering these are not unique to Node/JavaScript. Additionally, everybody wants different results for handling/logging errors - it'd be dumb as hell for the library itself to do any of it for you. Basically every library/framework in existence is going to expect you to handle errors in some way/shape/form.

I have yet to see a reproducible example where catching the error event doesn't solve someone's crashing from an error emitted within the library. If you have one, by all means, share it.

You seem to be relatively new to the Node.js ecosystem, which is totally understandable. That's completely fine - we all start somewhere. What's less okay, however, is posting multiple comments/issues about the same topic, filled to the brim with barely-veiled attacks and meaningless threats to the contributors. If you want to continue discussing this, for whatever reason, cool down a bit.

ghost commented 6 years ago

You can't have the cake and eat it too, either you have to admit your demo code in your OFFICIAL documentation is bug ridden trash that crashes every 2-7 days, or you have to admit your design is shit/buggy.

The fact that throwing errors is fatal wasn't up for debate, but your monkey brain decided to home in on that strawman fallacy as your core argument, that works about as well as your code and makes your entire 2nd paragraph a waste of time to read.

You didn't address in your third paragraph that I had given that code as example, and stated others have reported it doesn't work for them in all cases.

Your client is emitting the error, and your official documentation doesn't work as a result, it's not teaching node.js to document how you expect your code to be used in your documentation, because it's not Node.js emitting the error it's your shitty client, and it would be teaching ppl to use your badly designed client, not Node.js.

If people want a fatal error/crash to occur because of a networking error, that's fine, but IT SHOULD NOT BE THE DEFAULT and that is BECAUSE YOU BADLY DESIGNED IT TO BE THAT WAY. Just because you used throw error, doesn't mean you should, or Eris either. That's what is shitty, and should be fixed. And if you're going to dig in and keep making unreliable software go right ahead, there's plenty of libraries out there that don't and have never thrown fatal errors for a network error especially as a default behavior. No one was pinning you down and forced you to throw errors for network flukes but you act like it's the only possible solution in a sea of solutions, just cuz you picked it and it's bad. And don't worry I'm sure if you wait a bit another person telling you you're wrong will pop up as a "duplicate" issue for you close snarkily and defensively.

Gawdl3y commented 6 years ago

You can't have the cake and eat it too, either you have to admit your demo code in your OFFICIAL documentation is bug ridden trash that crashes every 2-7 days, or you have to admit your design is shit/buggy.

Or maybe your internet connection is utterly unreliable. I personally run a medium-sized bot with over 60,000 servers and 24 shards, and don't have this kind of instability you're having. That's rough.

A connection error in an application that relies on being connected 100% of the time is a perfect reason to throw an error. If it can't automatically reconnect, then what else is it going to do? Sit there quietly and do nothing until the end of time? Do you honestly want it to just fail silently? That's a complete joke.

The fact that throwing errors is fatal wasn't up for debate, but your monkey brain decided to home in on that strawman fallacy as your core argument, that works about as well as your code and makes your entire 2nd paragraph a waste of time to read.

You clearly don't understand what a straw man fallacy is. Here, let me help: I spent most of my time in my comment addressing the point that you definitely did raise (and spent the most time bitching about). Sounds reasonable to me.

You didn't address in your third paragraph that I had given that code as example, and stated others have reported it doesn't work for them in all cases.

You didn't give any code whatsoever, but rather an error with zero context on how to reproduce. How insightful.

Your client is emitting the error, and your official documentation doesn't work as a result, it's not teaching node.js to document how you expect your code to be used in your documentation, because it's not Node.js emitting the error it's your shitty client, and it would be teaching ppl to use your badly designed client, not Node.js.

As I already pointed out, lots of people want their bot to exit on error - either to automatically restart from scratch (using a tool such as PM2), or to let it die because that's not supposed to happen in a well-designed bot. It is not our job to determine what course of action to take for the user - nor is it any other library's.

If people want a fatal error/crash to occur because of a networking error, that's fine, but IT SHOULD NOT BE THE DEFAULT and that is BECAUSE YOU BADLY DESIGNED IT TO BE THAT WAY. Just because you used throw error, doesn't mean you should, or Eris either. That's what is shitty, and should be fixed. And if you're going to dig in and keep making unreliable software go right ahead, there's plenty of libraries out there that don't and have never thrown fatal errors for a network error especially as a default behavior. No one was pinning you down and forced you to throw errors for network flukes but you act like it's the only possible solution in a sea of solutions, just cuz you picked it and it's bad.

Unreliable software is software that fails silently. What do you want it to do instead of throw an error? Just log it to the console by default? That's stupid as hell for an API library to be doing. I don't think you have much experience relying on third-party libraries, judging by this mentality of yours.

You remind me of Trump supporters - no matter how much evidence is piled up against them that they're blatantly, irrefutably wrong on a subject, they refuse to accept it, and instead just incessantly spit out petty insults. Please, I welcome you to go use another library or language. Better yet, write your own, so we can all learn and benefit from the impeccable, unquestionable code that you'll clearly produce with your masterful design choices.

ghost commented 6 years ago

Or maybe your internet connection is utterly unreliable. I personally run a medium-sized bot with over 60,000 servers and 24 shards, and don't have this kind of instability you're having. That's rough.

It's on a server the hosting company of which already serves some of the biggest names in the industry, nice try. It's not making the entire bot fail either, it's one request failing that caused the entire bot to shut down that's sure good design :)))))))))

You clearly don't understand what a straw man fallacy is. Here, let me help:

Actually it's you who doesn't understand. The point I made was that throwing a fatal exception for a connection error is a bad design choice especially when your own official examples would have this same problem. Your response was hurrdurr, all these langauges use thrown exceptions and they all throw exceptions. That's classic strawman. You didn't refute what I made as an argument, you appeared to refute it by arguing something I didn't say.
You said this:

That's okay though, you can obviously just switch to an entirely different language to get away from the undesirable fatal errors/exceptions

Nowhere did I say that other languages don't have throw error or that they behave differently between languages in their function. You're implying I did, and then arguing that is wrong. Which is a strawman because you're arguing something I did not present at all. The only mention I made of other languages was to point out that they have more graceful error handling than javascript, such as PHP having notice, deprecated, warning, and fatal. Only Fatal errors will shut down a program. Javascript has warning, but you chose to throw error here on a network fluke instead of give some sort of console message or warning, which is dumb as shit. I'm not arguing the way throw error works, I'm arguing the dumb shit way you used it and when.

eponymz commented 6 years ago

Or here's a crazy thought. Catch the error, find the bug that you have in your code, handle the bug, take the try catch out. This entire discussion claiming that the lib is the problem is whats wrong here. Sure its brought up multiple times, but error handling is a rather amateur concept in coding and the fact that you are having issues with that says to me that you should hit up a Udemy course or two. They have ones that cover fundamentals even.

ghost commented 6 years ago

but error handling is a rather amateur concept in coding and the fact that you are having issues with that says to me that you should hit up a Udemy course or two. They have ones that cover fundamentals even.

So is throwing an error and knowing when and how to use it. And if you'd actually read what happened, you'd realize it's a random network error rarely occurring and how discord.js decides to throw an exception instead of a warning or other default behavior while at the same time, not accounting for this in their documentation examples. Deciding the default behavior for if a send message fails due to a network error is to crash an entire bot, when the lib was made for making bots in large part is a huge ? The fact so many other people report it as an issue should be a big red flag but nope, these devs dug in their heels. I could even live with the line being needed if it was shown in the documentation examples, even though it feels like a crappy hack, and that the library should not crash on a network error and be able to be overridden by a dev to do that if needed. But, they don't, and insist "this is just how Node.js works" even though that is blatantly false as the error is emitted from their client, and is their programming.

Honestly, I'm older than I wish to be and have way longer been in this industry than I like to think about, and the "you're just amateur" routine is old. I find it's mostly used by amateurs. I do think it's true 9 in every 10 coders suck ass though and I guess here we have more proof. Tribalism in coding is so ingrained anymore half the programmers don't even know up from down any more and are scared of their own shadow if it comes to writing actual algorithms that use math. Tribalism, the popular opinion and lowest common denominator are often all pointing to the same wrong thing. I expected this kind of response and the brain dead tribalism that came with it when I made this thread, but I at least wanted to inform the devs how bad their design is even if they won't listen.

I'm sure this issue will keep being brought up too, by more ppl, and they'll keep stating it's right to cause ppl issues and blame it on the users. I'd like to see them run a real business sometime with that mentality 👍

Triforcey commented 6 years ago

There is obviously a lot of controversy about if the library should handle this or not. However I think we're all missing that the real problem here is how many people are running into this problem and how difficult they report it as being to debug.

Whether the library is changed to handle this or not something needs to be done about that. We can fight over this as much as we want, the problem is still causing people trouble.

eponymz commented 6 years ago

So is throwing an error and knowing when and how to use it. And if you'd actually read what happened, you'd realize it's a random network error rarely occurring and how discord.js decides to throw an exception instead of a warning or other default behavior while at the same time, not accounting for this their documentation examples. Deciding the default behavior is if a send message fails due to a network error is to crash an entire bot, when the lib was made for making bots in large part is a huge ? The fact so many other people report it as an issue should be a big red flag but nope, these devs dug in their heels.

Honestly, I'm older than I wish to be and have way longer been in this industry than I like to think about, and the "you're just amateur" routine is old. I find it's mostly used by amateurs. I do think it's true 9 in every 10 coders suck ass though and I guess here we have more proof. Tribalism in coding is so ingrained anymore half the programmers don't even know up from down any more and are scared of their own shadow if it comes to writing actual algorithms that use math.

Been having the same issue with a slack bot at work to be honest with you. And instead of bitching about it on the repo, i decided to just take matters into my own hands and go through and debug the damn code. Maybe, i dunno, find the issue and add a mathematical algorithm to retry consistently until a limit is hit then give up and log the error while crashing the bot. That way i have a stack trace, and my bot will try to live as long as it can before dying.

BTW, this coming from an amateur coder, A.K.A me..

eponymz commented 6 years ago

I dont care who's right or wrong. If there's a problem, find it and fix it, guess what i bet if you find where the error is occurring ill bet you could even do this way crazy thing and PR it to the repo saying "hey guys! remember that error i told you about? i reproduced it in a controlled form and fixed it, here's the code"

ghost commented 6 years ago

Been having the same issue with a slack bot at work to be honest with you. And instead of bitching about it on the repo, i decided to just take matters into my own hands and go through and debug the damn code. Maybe, i dunno, find the issue and add a mathematical algorithm to retry consistently until a limit is hit then give up and log the error while crashing the bot. That way i have a stack trace, and my bot will try to live as long as it can before dying.

BTW, this coming from an amateur coder, A.K.A me..

I dont care who's right or wrong. If there's a problem, find it and fix it, guess what i bet if you find where the error is occurring ill bet you could even do this way crazy thing and PR it to the repo saying "hey guys! remember that error i told you about? i reproduced it in a controlled form and fixed it, here's the code"

The "fix" was given in my opening post though some say it doesn't work properly and the documentation doesn't reflect its use. You can also wrap it in forever or pm2, but the problem with that is the state of any variables gets reset when the bot restarts. Sometimes it's just the lib or documentation isn't right.

Just because I can spend extra man hours to fix it doesn't mean it's right, especially considering the fact it's still going to be affecting all the other ppl reporting the problem. That the devs write them off as just ignorant of how node.js works is laughable.

adamschachne commented 6 years ago

client.on('error', console.error);

I'm sure a lot of developers would appreciate this somewhere in the examples.

Triforcey commented 6 years ago

@wolfe7 What's you're point? As you mentioned, the current "fix" for this issue is very pitiful and maybe inconsistent. Therefore we should make a better/real fix. That's what @eponmyz is talking about doing. I don't see why you're arguing with them.

eponymz commented 6 years ago

The "fix" was given in my opening reply though some say it doesn't work properly and the documentation doesn't reflect it's use. You can also wrap it in forever or pm2, but the problem with that is the state of any variables gets reset when the bot restarts. Sometimes it's just the lib or documentation isn't right.

Just because I can spend extra man hours to fix it doesn't mean it's right, especially considering the fact it's still going to be affecting all the other ppl reporting the problem. That the devs write them off as just ignorant of how node.js works is laughable.

But.. it is. There are plenty of bot libs that have this issue and it isnt just discord libs. Sure it would be nice to have a method in the lib to handle the reconnection but if you look at how you reference this "random error" and then tell the devs here to fix it. That, is laughable. How can they fix something they cant find because no one with the error CATCHES THE DAMN THING AND HANDLES IT.

ghost commented 6 years ago

@wolfe7 What's you're point? As you mentioned, the current "fix" for this issue is very pitiful and maybe inconsistent. Therefore we should make a better/real fix. That's what @eponmyz is talking about doing. I don't see why you're arguing with them.

It either shouldn't throw an exception or it should have their examples working in the case of a network error. The default behavior should also give the trace to a line number in your code unlike it does now. Using the line works, for me at least though as I've said others have reported it doesn't solve the issue for them. But it's not in the documentation and it's also bad design to throw an error to people who don't know they'll even need to be catching it b/c you decided to throw an error in your client that isn't handled. If you DO decide to do that for whatever reason, you should use it in documentation examples.

Sure it would be nice to have a method in the lib to handle the reconnection

I mean that doesn't really have anything to do with the thread at all.

this "random error" and then tell the devs here to fix it.

and neither does that.

Triforcey commented 6 years ago

Thanks for the explanation @wolfe7, I think either of the solutions you suggest should satisfy everyone here, as long as it still remains an option (but not required) to handle the error yourself. Perhaps I can turn this into some real code and then into a PR soon, unless someone else beats me to it.

devsnek commented 6 years ago

@wolfe7 Hi there! Hopefully I can clear up some of this misunderstanding.

Operations that modify state on Discord (For example, sending a message or deleting a channel.) use HTTP REST requests, and are always explicitly initiated by the person writing a bot. These methods always return a promise, which you can handle as you see fit.

The second component of how bots interact with Discord is the WebSocket. The WebSocket feeds live events from Discord, telling us when someone sends a message or deletes a channel or whatever. If this connection is lost, and the library determines that it cannot be reconnected, we emit the error event because at this point the bot is in a fatal condition. There is nothing it can do to itself to properly recover.

Regardless of whether you personally agree with the design of Node.js's EventEmitter, we as a library of the Node.js ecosystem choose to opt into the pattern to keep consistency and maintainability under control.

Adding an error handler to the documentation has been debated before, but ultimately ruled out because we determined that using the error event is such a common pattern in the ecosystem, and we wanted to keep focus on the specific API of our library. If you disagree with that, this project is OSS, and if you would like to improve the documentation (or any part of the library) please feel free to do so. We gladly welcome all constructive collaboration to the project.

appellation commented 6 years ago

Teaching people how node event emitters works is beyond the scope of the documentation. I can see this being a great page in the guide, perhaps even part of the "creating your first bot" section.

The ECONNRESET error that's in question here is likely generated when the library's underlying connection to the Discord gateway gets reset. We can definitely separate this error from any other errors that the websocket may generate, but then we run into consistency issues and start crossing into a grey area of "what is going to be critical for the end user?" Is ECONNREFUSED important? Maybe there are other unknown errors the websocket could encounter? Considering that this is an error with the websocket connection, it seems like this should be reported to the user as an error.

You would not want websocket errors emitted only if you consider the websocket to be a second-class citizen in your application. I find this to be objectively incorrect, as a typical bot is pretty useless without its websocket connection. If your bot functions without being connected to the gateway, you probably should be using a different API wrapper.

It is worth noting that the only thing you have to do in order to ensure your bot does not restart is provide an error listener. It does not even have to do anything. However, as an API wrapper, it is our job to report errors to the end user.

Edit: terminology

Triforcey commented 6 years ago

@appellation that's already been said plenty. We're a bit past that. We can still report this and handle it by default. The issue isn't how the error is thrown but when/why it is thrown and that it's unclear that it could be thrown.

@devsnek What you bring up is rather interesting, if what you say is true than the error is being thrown at completely the wrong time as the web socket connection often can recover if you catch the error yourself. You're totally right that if the error were fatal the library doesn't need to handle it. Perhaps this issue goes deeper than it appears.

iCrawl commented 6 years ago

I'm going to close lock this because any further conversation is just pointless from now on. @devsnek laid out pretty well why we do things as we do them, if you disagree with fundamental decisions the language ecosystem makes, you are free to fork this project and make the modifications needed or switch to another library. Surprise, surprise though, they will handle it the absolute same way,