amark / gun

An open source cybersecurity protocol for syncing decentralized graph data.
https://gun.eco/docs
Other
18.05k stars 1.16k forks source link

GUN crashed in middle of IA demo ! #620

Open mitra42 opened 5 years ago

mitra42 commented 5 years ago

@amark - looks like it was GUN that went down mid demo, I think you were in the Audience, it was restarted by Supervisorctl

I'm not 100% on this, ie. GUn's crash may have been caused by something else, like high load-average on that machine with everyone using it but certainly GUN was restarted and everything else stayed up.

amark commented 5 years ago

Do you have stack trace?

GUN has no server facing components that affect the front-end UI though, worse case would be that browser websocket would throw an error after websocket server received too much load, but that wouldn't stop the page from loading or blocking any further UI action.

mitra42 commented 5 years ago

No - that's why it surprises me, unless it sent back bad data that confused things, I've seen queries for metadata/undefined which suggests some kind of problem.

amark commented 5 years ago

Do you have stack trace?

GUN has no server facing components that affect the front-end wUI though, worse case would be that browser websocket would throw an error after websocket server received too much load, but that wouldn't stop the page from loading or blocking any further UI action.

amark commented 5 years ago

Woops

amark commented 5 years ago

Still talking to people here and bumped my phone (no lock)

amark commented 5 years ago

@mitra42 i also got a ngnix gateway error while the demo was happening. I just wasn't sure if you were blaming me for the bootloader getting blocked.

As far as gun websocket server crashing (which shouldn't block / stop anything other than that session), yeah, a version or two ago I added checks to prevent it from sending messages that are larger than NodeJS heapspace however found out that there isn't a way to predict it correctly, and added (to the kitchen sink demo) auto restart of process in the meanwhile maybe I'll move that into the main server too.

Just wanted to be clear that the ngnix gateway error isn't something the nodejs server would trigger. But yes, gracefully handling socket scaling is important, and the logs will help, thank you!

mitra42 commented 5 years ago

Thanks @amark - I’m not ssaying GUN caused the issue, I am saying that GUN went down, and was restarted by supervisorctl . In searching for possible problem I found an error: { Error: EACCES: permission denied, open 'radata-!-wbv.tmp' errno: -13, code: 'EACCES', syscall: 'open', path: 'radata-!-wbv.tmp' } I also saw that when Brewster asked everyone to jump on dweb.archive.org that loadaverage hit about 30 on that machine, so things might have started failing due to timeouts. I’m trying to explore all the possible causes of the issue, will probably try and figure out how to simulate it when I get back to SF in 2 weeks.

mitra42 commented 5 years ago

Note - I'm seeing Gun crashing about once or twice a day even without load.

amark commented 5 years ago

Same EACCESS error (this one looks like it is "caught" and not crashing VS other errors like "FATAL HEAP ALLOCATION" that still crash gun cause it can't be caught)? Or something else?

mitra42 commented 5 years ago

Here's the log fragment. I'm not sure if the preceeding line refers to previous request or the one that caused the crash . Note there are at least two issues in this log a) the actual crash b) that several of the requests ask for metadata/undefined, or at least log that they did.

  '.': 'vanityfair_version2_1704_librivox' }
GUN.hijack: soul= jk4bfo94TUGLgM4DgwRn key= vanityfair_version2_1704_librivox
CB matching jk4bfo94TUGLgM4DgwRn against jk4bfo94TUGLgM4DgwRn
CB matched jk4bfo94TUGLgM4DgwRn against jk4bfo94TUGLgM4DgwRn trying http://dweb.me/arc/archive.org/metadata/vanityfair_version2_1704_librivox
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn the_rainbow_1503_librivox 172304
Note error from fetch might be misleading especially TypeError can be Cors issue: http://dweb.me/arc/archive.org/metadata/undefined
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn sonsandlovers_1501_librivox 246550
Note error from fetch might be misleading especially TypeError can be Cors issue: http://dweb.me/arc/archive.org/metadata/undefined
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn marcia_schuyler_1603_librivox 178838
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn maria_chapdelaine2_1702_librivox 95072
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn vanityfair_version2_1704_librivox 346612
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn sons_and_lovers_mfs_librivox 234932
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn heart_and_science_1611_librivox 328634
/usr/local/dweb-transport/node_modules/gun/gun.js:641
                                                        if(opt.age > (now - it.was)){ return }
                                                                               ^

TypeError: Cannot read property 'was' of undefined
    at Object.<anonymous> (/usr/local/dweb-transport/node_modules/gun/gun.js:641:31)
    at Object.Type.obj.map (/usr/local/dweb-transport/node_modules/gun/gun.js:136:19)
    at Timeout._onTimeout (/usr/local/dweb-transport/node_modules/gun/gun.js:640:16)
    at ontimeout (timers.js:466:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:267:5)
Hello wonderful person! :) Thanks for using GUN, feel free to ask for help on https://gitter.im/amark/gun and ask StackOverflow questions tagged with 'gun'!
HTTPS Server started on port 4246 with /gun
GUN: Hikacking loading trap
amark commented 5 years ago

@mitra42 sorry was out of the country, been gone for the last 2 weeks. Back now.

Are you still in town?

it.was crash - looks like you are not on latest, you reported this before and I fixed it. Do you have any new logs with crash?

@go1dfish in the chatroom while I was gone you mentioned GUN occasionally crashing/restarting, can you pull those logs too?

mitra42 commented 5 years ago

Not sure which version its running ... it would be helpful if the startup message gave the version number instead of/as well as, the "Hello wonderful person" :-)

GUN: Hikacking outgoing message original= { '#': 'jk4bfo94TUGLgM4DgwRn' }
GUN.hijack: soul= jk4bfo94TUGLgM4DgwRn key= undefined
CB matching jk4bfo94TUGLgM4DgwRn against jk4bfo94TUGLgM4DgwRn
CB matched jk4bfo94TUGLgM4DgwRn against jk4bfo94TUGLgM4DgwRn trying http://dweb.me/arc/archive.org/metadata/undefined
err { Error: EACCES: permission denied, open 'radata-!-gkk.tmp'
  errno: -13,
  code: 'EACCES',
  syscall: 'open',
  path: 'radata-!-gkk.tmp' }
GUN.hijack updated msg with data = jk4bfo94TUGLgM4DgwRn archiveitpartners 3379
Note error from fetch might be misleading especially TypeError can be Cors issue: http://dweb.me/arc/archive.org/metadata/undefined
Note error from fetch might be misleading especially TypeError can be Cors issue: http://dweb.me/arc/archive.org/metadata/undefined
Note error from fetch might be misleading especially TypeError can be Cors issue: http://dweb.me/arc/archive.org/metadata/undefined
Note error from fetch might be misleading especially TypeError can be Cors issue: http://dweb.me/arc/archive.org/metadata/undefined
/usr/local/dweb-transport/node_modules/gun/gun.js:641
                                                        if(opt.age > (now - it.was)){ return }
                                                                               ^

TypeError: Cannot read property 'was' of undefined
    at Object.<anonymous> (/usr/local/dweb-transport/node_modules/gun/gun.js:641:31)
    at Object.Type.obj.map (/usr/local/dweb-transport/node_modules/gun/gun.js:136:19)
    at Timeout._onTimeout (/usr/local/dweb-transport/node_modules/gun/gun.js:640:16)
    at ontimeout (timers.js:466:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:267:5)
Hello wonderful person! :) Thanks for using GUN, feel free to ask for help on https://gitter.im/amark/gu
mitra42 commented 5 years ago

Looks like its running version 0.9.99993. I don't see in here which version it was fixed in.

amark commented 5 years ago

Oh that is a good idea!

https://github.com/amark/gun/commit/04c731cfe763991f49a81afa927c83cbb0b2b075

Jonathan mentioned to me you guys were chatting! :) Glad to know you are getting roped into the convo/project as well. ;)

mitra42 commented 5 years ago

I'm not seeing a version number in that push ?

amark commented 5 years ago

@mitra42 latest is 0.9.99997 you should upgrade to (fixes .once not calling on undefined too).

mitra42 commented 5 years ago

OK updated and restarted, we'll see how it goes

go1dfish commented 5 years ago

My crashes are still memory allocation errors in flush.

Only seems to happen for my user facing peer that handles a lot of connections.