aluisiora / node-routeros

Mikrotik Routerboard RouterOS API for NodeJS
MIT License
99 stars 43 forks source link

processRawData bug... #17

Closed ZackMattor closed 4 years ago

ZackMattor commented 4 years ago

Howdy! We are using the newest release of this library. It seems like we are hitting some stream parsing issue. We were using the Mikronode library previously and we needed to fix a bug regarding the sentance length parsing and I wonder if this is somehow related?

https://github.com/violetatrium/mikronode/commit/10467e3d806cebbc04b795ea38fca02d98ae5679

 TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be one of type string, Buffer, ArrayBuffer, Array, or Array-like Object. Received type object
     at Function.from (buffer.js:305:9)
     at Receiver.processRawData (/usr/src/app/node_modules/node-routeros/dist/connector/Receiver.js:136:65)
     at Connector.onData (/usr/src/app/node_modules/node-routeros/dist/connector/Connector.js:192:23)
     at Socket.emit (events.js:223:5)
     at Socket.EventEmitter.emit (domain.js:475:20)
     at Shim.applySegment (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:1424:20)
     at Socket.wrapper [as emit] (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:2065:17)
     at addChunk (_stream_readable.js:309:12)
     at readableAddChunk (_stream_readable.js:290:11)
     at Socket.Readable.push (_stream_readable.js:224:10)
ZackMattor commented 4 years ago

Also this is an unhandled exception... this should probably be bubbled up through the routeros.on('error')?

ZackMattor commented 4 years ago

@tyler-sommer found this...

https://github.com/aluisiora/node-routeros/blob/development/src/connector/Receiver.ts#L152

> Buffer.from(null, 'ascii')
TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be one of type string, Buffer, ArrayBuffer, Array, or Array-like Object. Received type null
aluisiora commented 4 years ago

Are you running it with node v12? I'll might have to update it.

If you got it fixed already, you could make a PR.

aluisiora commented 4 years ago

Also this is an unhandled exception... this should probably be bubbled up through the routeros.on('error')?

Regarding this, since it might be a syntax error, it shouldn't. A timeout, a permission error or anything like that can be treated and circumvented. A syntax error, however, can't unless you fix the code. So it should stop execution and should be fixed before runtime.

ZackMattor commented 4 years ago

We are running node v12. I went back to node v5 and saw the same error trying to run Buffer.from(null, 'ascii').... I think Buffer.from('\0', 'ascii') will give us what we need. I'll try patching and see if it fixes our problem.

Also agreed on the syntax error comment!

I'd love to hear your thoughts on handling connection errors when you have command routeros.write() calls inflight. Currently the routeros.on('error') gets triggered but all of the write() promises never resolve. Would be cool if we could reject all inflight calls vs leaving them orphaned. Currently we're keeping a hash indexed off of a request call UUID we're generating. The value is an object containing the resolve and reject of a wrapper we have around `routeros.write(). When it resolves or rejects we delete the element out of the hash.

{
   'REQUEST_ID1': { resolve, reject } ,
   'REQUEST_ID2': {resolve, reject}
}

And when on('error') gets called we reject all inflight promises and delete them from the hash. I can open up a new issue for this conversation if you want.

aluisiora commented 4 years ago

That is weird, are you storing the promise that the routeros.write() returns? It should return it.

The way I do it is something like this (js example):

try {
    const data = await ros.write('some/menu', '[some-command]');
    console.log(data);
} catch(error) {
   console.log('got an error: ' + error.message);
}

If that is not the case, you could open another issue regarding that.

Now to the main topic, I'll only have time over the weekend to test it, but if even will old node versions it was throwing errors, might be the typescript version then, \0 might be enough, I'll look into it.

I was using node v10 previously.

aluisiora commented 4 years ago

@tyler-sommer found this...

https://github.com/aluisiora/node-routeros/blob/development/src/connector/Receiver.ts#L152

> Buffer.from(null, 'ascii')
TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be one of type string, Buffer, ArrayBuffer, Array, or Array-like Object. Received type null

I could not reproduce this, neither with newer typescript nor with node v12, I really don't know why it was restricting like this. Either way, in version 1.6.4 I replaced Buffer.from(null, 'ascii') to Buffer.from('\0', 'ascii') since it doesn't change much.

Let me know if that fixes it for you.

ZackMattor commented 4 years ago

You really couldn't reproduce that error.. that's interesting. I'm on the latest MacOS release - and we're seeing the issue on the node12-alpine image. I'll try out the new release today and let you know if it fixes our issues.

$ node -v
v12.10.0
$ node
Welcome to Node.js v12.10.0.
Type ".help" for more information.
> Buffer.from(null, 'ascii')
Thrown:
TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be one of type string, Buffer, ArrayBuffer, Array, or Array-like Object. Received type object
    at Function.from (buffer.js:293:9)
    at repl:1:8
    at Script.runInThisContext (vm.js:126:20)
    at REPLServer.defaultEval (repl.js:384:29)
    at bound (domain.js:420:14)
    at REPLServer.runBound [as eval] (domain.js:433:12)
    at REPLServer.onLine (repl.js:700:10)
    at REPLServer.emit (events.js:214:15)
    at REPLServer.EventEmitter.emit (domain.js:476:20)
    at REPLServer.Interface._onLine (readline.js:316:10)
>
>
> Buffer.from('\0', 'ascii')
<Buffer 00>
>
aluisiora commented 4 years ago

Interestingly enough, through the cli it does throw that error, but not within the code... huh...

I wish I had more time to look into this, I'm not currently working with this lib.

But in any case it should be fixed now.

ZackMattor commented 4 years ago

Looks like we're not out of the woods yet :)

image

Looks like we're getting this event triggered...

https://github.com/aluisiora/node-routeros/blob/7280d80843674fe4dc581ebc38fbeb09d4ec0e31/src/Channel.ts#L183-L193

This is looking a lot like the bug we were hitting with mikronode... Which ended up being an issue with the length algorithm detecting the null character at the wrong time (if i recall correctly) Mikronode was blindly splitting on \00 which was a valid length after a certain character... I'll try to make a test to reproduce.

aluisiora commented 4 years ago

Could you sample the write() call that you are trying to do?

There shouldn't be any !=multicast... message received from routeros, it really should be something with the detection. A big part of the socket connection messaging code came from mikronode, so it might be the same issue.

ZackMattor commented 4 years ago

I may have to add some additional logging in the node-routeros package in order to track it down. I just tried the sample packet we captured a few months ago when we fixed the issue in mikronode and your library seems to parse it fine. I'll include it here. :)

I'll need to add some verbose logging to my app in order to figure out the exact command thats running. but seems to be something regarding bridging - but not all.

Here's some more log output from today with this exception.

image

const { Receiver } = require("../dist/connector/Receiver.js");

// this validates that the decodePackets function properly handles
// incomplete records, returning the correct "leftover" bytes.

// contains 6 complete records and only the header of the 7th record.
// in other words, the last 4 characters of this payload are: \x03!re
// and it is these 4 characters that should be in the leftover.
let payload = 'AyFyZRQudGFnPW1pa3JvdGlrY24xMTctMQ49LmlkPSozMDAwMDAwNBY9ZHN0LWFkZHJlc3M9MC4wLjAu' +
  'MC8wFj1nYXRld2F5PTE5Mi4xNjguMjIyLjEzPWdhdGV3YXktc3RhdHVzPTE5Mi4xNjguMjIyLjEgcmVh' +
  'Y2hhYmxlIHZpYSAgZXRoZXIxCz1kaXN0YW5jZT0xCT1zY29wZT0zMBA9dGFyZ2V0LXNjb3BlPTEwFT12' +
  'cmYtaW50ZXJmYWNlPWV0aGVyMQw9YWN0aXZlPXRydWUNPWR5bmFtaWM9dHJ1ZQw9c3RhdGljPXRydWUP' +
  'PWRpc2FibGVkPWZhbHNlAAMhcmUULnRhZz1taWtyb3Rpa2NuMTE3LTEHPS5pZD0qMRo9ZHN0LWFkZHJl' +
  'c3M9MTAuMC4wLjExMy8zMhE9Z2F0ZXdheT0xMC4wLjAuMTE9Z2F0ZXdheS1zdGF0dXM9MTAuMC4wLjEg' +
  'cmVhY2hhYmxlIHZpYSAgR3VzdGEtVlBOEz1jaGVjay1nYXRld2F5PXBpbmcLPWRpc3RhbmNlPTEJPXNj' +
  'b3BlPTMwED10YXJnZXQtc2NvcGU9MTAMPWFjdGl2ZT10cnVlDD1zdGF0aWM9dHJ1ZQ89ZGlzYWJsZWQ9' +
  'ZmFsc2UaPWNvbW1lbnQ9R3VzdGEgQVBJIEdhdGV3YXkAAyFyZRQudGFnPW1pa3JvdGlrY24xMTctMQc9' +
  'LmlkPSoyGD1kc3QtYWRkcmVzcz0xMC4wLjQuMC8yMhE9Z2F0ZXdheT0xMC4wLjAuMTE9Z2F0ZXdheS1z' +
  'dGF0dXM9MTAuMC4wLjEgcmVhY2hhYmxlIHZpYSAgR3VzdGEtVlBOEz1jaGVjay1nYXRld2F5PXBpbmcL' +
  'PWRpc3RhbmNlPTEJPXNjb3BlPTMwED10YXJnZXQtc2NvcGU9MTAMPWFjdGl2ZT10cnVlDD1zdGF0aWM9' +
  'dHJ1ZQ89ZGlzYWJsZWQ9ZmFsc2UVPWNvbW1lbnQ9Y3VzdG9tIHJvdXRlAAMhcmUULnRhZz1taWtyb3Rp' +
  'a2NuMTE3LTEOPS5pZD0qNDAwNTkwMzUYPWRzdC1hZGRyZXNzPTEwLjAuMC4xLzMyFT1wcmVmLXNyYz0x' +
  'MC4wLjk4LjE4NxI9Z2F0ZXdheT1HdXN0YS1WUE4jPWdhdGV3YXktc3RhdHVzPUd1c3RhLVZQTiByZWFj' +
  'aGFibGULPWRpc3RhbmNlPTAJPXNjb3BlPTEwDD1hY3RpdmU9dHJ1ZQ09ZHluYW1pYz10cnVlDT1jb25u' +
  'ZWN0PXRydWUPPWRpc2FibGVkPWZhbHNlAAMhcmUULnRhZz1taWtyb3Rpa2NuMTE3LTEOPS5pZD0qNDAw' +
  'NUZDRTEcPWRzdC1hZGRyZXNzPTEwLjAuMjU1LjI1NC8zMhU9cHJlZi1zcmM9MTAuMC45OC4xODYYPWdh' +
  'dGV3YXk9R3VzdGEtU3BlZWR0ZXN0KT1nYXRld2F5LXN0YXR1cz1HdXN0YS1TcGVlZHRlc3QgcmVhY2hh' +
  'YmxlCz1kaXN0YW5jZT0wCT1zY29wZT0xMAw9YWN0aXZlPXRydWUNPWR5bmFtaWM9dHJ1ZQ09Y29ubmVj' +
  'dD10cnVlDz1kaXNhYmxlZD1mYWxzZQADIXJlFC50YWc9bWlrcm90aWtjbjExNy0xDj0uaWQ9KjQwMDVF' +
  'OTY5HD1kc3QtYWRkcmVzcz0xOTIuMTY4Ljg3LjAvMjQWPXByZWYtc3JjPTE5Mi4xNjguODcuMQ89Z2F0' +
  'ZXdheT1icmlkZ2UgPWdhdGV3YXktc3RhdHVzPWJyaWRnZSByZWFjaGFibGULPWRpc3RhbmNlPTAJPXNj' +
  'b3BlPTEwDD1hY3RpdmU9dHJ1ZQ09ZHluYW1pYz10cnVlDT1jb25uZWN0PXRydWUPPWRpc2FibGVkPWZh' +
  'bHNlAAMhcmU=';

let buff = Buffer.from(payload, 'base64');

console.log(buff);
console.log(buff.toString());

let rec = new Receiver({emit: ()=>{}});

rec.read('mikrotikcn117-1', (data) => {
  console.log('got tag', data);
});

rec.processRawData(buff);
ZackMattor commented 4 years ago

The plot thickens..... I got the last 3000 bytes from a receiver that caused this to throw.... turned out that all 4 packets in those 3000 bytes were able to be parsed without issue in my testing. I think it's getting caught up much earlier in the chain. We do have a ton of orphaned promises that indicate this may be a more widespread issue than I thought.... I'll keep this thread updated :)

ZackMattor commented 4 years ago

Is it possible that we don't get the enough of a buffer to correctly decode the length here? Say the previous message got sliced off and you're left with 2 bytes left - but we need 4 to actually specify the length - and they dont come till the next data over the socket arrives?

https://github.com/ZackMattor/node-routeros/pull/1/files#diff-0021f79841048d22228288cce6ff2767R219

ZackMattor commented 4 years ago

I found it! 12 hours of staring at hex dumps.... 😅

The issue appears was before this actual exception. The exception happens in a very rare case where the first word starts with the last byte from a length descriptor that ends up being a !....

The actual issue happens way before this where an offset error happens. We basically mis calculate the length because it gets split between data buffers.

here are my changes and stuff i added to debug... I think there's a better way to handle it but i'll monitor our prod system for a bit and see make sure its stable. I realized after the fact i can just use idx as the length of the length descriptor... Basically i just make sure the buffer has enough data in it to accurately decode the length, and if not i store if for the next data packet.

https://github.com/ZackMattor/node-routeros/pull/1/files

aluisiora commented 4 years ago

Really liked the way you added comments there, it's been some time since I wrote that.

If the problem is like what you describe, which really seems to be, the logic behind the buffer length calculations might need an overhaul, which will surely take some time. If you happen to solve it, it'd be nice if you could make a PR.

ZackMattor commented 4 years ago

I have solved it - I planned on opening a PR here over the weekend but didn't get around to it. I'll try to find time to do it sometime this week :)