Closed honusz closed 2 years ago
Hi @honusz and thanks for reporting this. Are you able to debug the code? There is a fixme in Controller.ts
:
// FIXME: How to determine when all services have been announced?
if (Object.keys(this.servicePorts).length > 3) {
resolve();
}
Can you see if this.servicePorts
ever gets filled up? I assumed that there are at least 3 services; but perhaps that assumption is incorrect?
Also, feel free to set a few breakpoints in the messageHandler()
switch statement to see if any of the messages are received.
Thanks!
Hi @honusz and thanks for reporting this. Are you able to debug the code? There is a fixme in Controller.ts: Happy to help =) I can, though my debug skills are only recently evolved from console.log().
I tried setting debug points around that fixme, but this
shows as undefined (because the promise hasn't been returned?). Perhaps I'm not doing this correctly?
Regardless, I'm fairly certain servicePorts isn't getting filled. By the time that breakpoint is hit, the device has already closed the connection according to wireshark. This happens immediately (from what I can tell) after the ServicesRequest message is written on the wire.
haha console.log()
is your friend :) Can you replace the messageHandler with this code?
messageHandler(p_message: Buffer): void {
const ctx = new ReadContext(p_message.buffer, false);
while (ctx.isEOF() === false) {
const id = ctx.readUInt32();
// FIXME: Verify token
ctx.seek(16); // Skip token; present in all messages
console.log(`message ${id}, sizeLeft ${ctx.sizeLeft()}`);
switch (id) {
case MessageId.TimeStamp:
ctx.seek(16); // Skip token; present in all messages
// Time Alive is in nanoseconds; convert back to seconds
this.timeAlive = Number(ctx.readUInt64() / (1000n * 1000n * 1000n));
console.log(`TimeStamp ${this.timeAlive}`);
break;
case MessageId.ServicesAnnouncement:
const service = ctx.readNetworkStringUTF16();
const port = ctx.readUInt16();
this.servicePorts[service] = port;
console.log(`ServicesAnnouncement ${service}:${port}`);
break;
case MessageId.ServicesRequest:
console.log(`ServicesRequest`);
break;
default:
assert.fail(`Unhandled message id '${id}'`);
break;
}
}
}
And send me the output?It should be something like:
[17:05:27] [INFO] Found 'primego' Controller at '192.168.178.34:41973' with following software: { name: 'OfflineAnalyzer', version: '1.0.0' }
[17:05:27] [INFO] Connected to primego:41973 local port: 1083
[17:05:27] [LOG] message 2, sizeLeft 50
[17:05:27] [LOG] ServicesRequest
[17:05:27] [LOG] message 0, sizeLeft 30
[17:05:27] [LOG] ServicesAnnouncement FileTransfer:43961
[17:05:28] [LOG] message 1, sizeLeft 24
[17:05:28] [LOG] TimeStamp 164
Also, what firmware version are you on?
Digging into it as I suspect something is going on with the handshake token. For some reason I have the same problems now as well at fw 2.1.1. So my guess is that they fixed something in this latest firmware. The weird thing is that sometimes it does work. And sometimes it doesn't. Those are the best bugs š
Digging into it as I suspect something is going on with the handshake token. For some reason I have the same problems now as well at fw 2.1.1. So my guess is that they fixed something in this latest firmware. The weird thing is that sometimes it does work. And sometimes it doesn't. Those are the best bugs š
Well, the good news is that I think at least we're experiencing the same error now, which is something š
Just a quick update: I tried replacing messageHandler with the above code, and set a breakpoint there, but it didn't even reach before the crash.
I tried it on my Macbook Pro (M1, OS 11.6.2), and it was able to requestAllServicePorts.
[12:31:08] [INFO] Found 'sc6000' Controller at '192.168.2.144:40439' with following software: { name: 'JP13', version: '2.1.1' }
[12:31:09] [LOG] TCP connection to '192.168.2.144:40439' local port: 57143
[12:31:09] [INFO] Discovered the following services:
[12:31:09] [INFO] port: 41137 => StateMap
[12:31:09] [INFO] port: 35915 => Broadcast
[12:31:09] [INFO] port: 44833 => Syncing
[12:31:09] [INFO] port: 44389 => TimeSynchronization
[12:31:09] [INFO] port: 39835 => BeatInfo
[12:31:09] [INFO] port: 37833 => FileTransfer
[12:31:09] [LOG] TCP connection to '192.168.2.144:37833' local port: 57144
Bad news: I hadn't yet added your modified messageHandler code and didn't have any breakpoints set (also, it crashed during FileTransfer, but one thing at a time š). Good news: I did have WireShark capturing, both the good requestAllServicePorts, and the one I tried to run right after that failed.
The successful tcp convo went like this:
00000000 00 00 00 02 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
00000010 a8 a7 22 0e ..".
00000000 00 00 00 02 52 fd fc 07 21 82 65 4f 16 3f 5f 0f ....R... !.eO.?_.
00000010 9a 62 1d 72 .b.r
00000014 00 00 00 00 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
00000024 a8 a7 22 0e 00 00 00 10 00 53 00 74 00 61 00 74 .."..... .S.t.a.t
00000034 00 65 00 4d 00 61 00 70 a0 b1 .e.M.a.p ..
0000003E 00 00 00 00 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
0000004E a8 a7 22 0e 00 00 00 12 00 42 00 72 00 6f 00 61 .."..... .B.r.o.a
0000005E 00 64 00 63 00 61 00 73 00 74 8c 4b .d.c.a.s .t.K
0000006A 00 00 00 00 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
0000007A a8 a7 22 0e 00 00 00 0e 00 53 00 79 00 6e 00 63 .."..... .S.y.n.c
0000008A 00 69 00 6e 00 67 af 21 .i.n.g.!
00000092 00 00 00 00 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
000000A2 a8 a7 22 0e 00 00 00 26 00 54 00 69 00 6d 00 65 .."....& .T.i.m.e
000000B2 00 53 00 79 00 6e 00 63 00 68 00 72 00 6f 00 6e .S.y.n.c .h.r.o.n
000000C2 00 69 00 7a 00 61 00 74 00 69 00 6f 00 6e ad 65 .i.z.a.t .i.o.n.e
000000D2 00 00 00 00 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
000000E2 a8 a7 22 0e 00 00 00 10 00 42 00 65 00 61 00 74 .."..... .B.e.a.t
000000F2 00 49 00 6e 00 66 00 6f 9b 9b .I.n.f.o ..
000000FC 00 00 00 00 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
0000010C a8 a7 22 0e 00 00 00 18 00 46 00 69 00 6c 00 65 .."..... .F.i.l.e
0000011C 00 54 00 72 00 61 00 6e 00 73 00 66 00 65 00 72 .T.r.a.n .s.f.e.r
0000012C 93 c9 ..
0000012E 00 00 00 01 4b e1 41 12 5e ad 48 48 a0 7d b3 7c ....K.A. ^.HH.}.|
0000013E a8 a7 22 0e 00 00 00 00 00 00 00 00 00 00 00 00 .."..... ........
0000014E 00 00 00 00 00 00 08 60 72 71 3c d2 .......` rq<.
The subsequent one that crashed only has the app's side of the convo (the requestAllServicePorts message).
Perhaps we're jumping the gun on sending the request? Maybe the device expects the request to come after it's sent the 4b e1 41 12 5e ad 48 48 a0 7d b3 7c a8 a7 22 0e
message, and it's only working when it has time to do so?
Had a successful run (until FileTransfer)
[13:46:49] [INFO] Found 'sc6000' Controller at '192.168.2.144:40439' with following software: { name: 'JP13', version: '2.1.1' }
[13:46:49] [LOG] TCP connection to '192.168.2.144:40439' local port: 58391
[13:46:49] [LOG] message 2, sizeLeft 0
[13:46:49] [LOG] ServicesRequest
[13:46:50] [LOG] message 1, sizeLeft 24
[13:46:50] [LOG] TimeStamp 13346
[13:46:52] [LOG] message 0, sizeLeft 262
[13:46:52] [LOG] ServicesAnnouncement StateMap:41137
[13:46:52] [LOG] message 0, sizeLeft 220
[13:46:52] [LOG] ServicesAnnouncement Broadcast:35915
[13:46:52] [LOG] message 0, sizeLeft 176
[13:46:52] [LOG] ServicesAnnouncement Syncing:44833
[13:46:52] [LOG] message 0, sizeLeft 136
[13:46:52] [LOG] ServicesAnnouncement TimeSynchronization:44389
[13:46:52] [LOG] message 0, sizeLeft 72
[13:46:52] [LOG] ServicesAnnouncement BeatInfo:39835
[13:46:52] [LOG] message 0, sizeLeft 30
[13:46:52] [LOG] ServicesAnnouncement FileTransfer:37833
[13:46:52] [INFO] Discovered the following services:
[13:46:52] [INFO] port: 41137 => StateMap
[13:46:52] [INFO] port: 35915 => Broadcast
[13:46:52] [INFO] port: 44833 => Syncing
[13:46:52] [INFO] port: 44389 => TimeSynchronization
[13:46:52] [INFO] port: 39835 => BeatInfo
[13:46:52] [INFO] port: 37833 => FileTransfer
[13:46:52] [LOG] TCP connection to '192.168.2.144:37833' local port: 58392
Thanks for this! Do you also have more information about when there is NO successful run and Discovered the following services:
isn't displayed? I'd love to see the network packages of that.
Also, just as a hunch, can you alter some values in common.ts
:
export const CLIENT_TOKEN = new Uint8Array([82, 253, 252, 7, 33, 130, 101, 79, 22, 63, 95, 15, 154, 98, 29, 114]);
Don't add or delete entries, just change them (number between 0-255). I did notice that in the Go code, the client token is randomized. I didn't implement that, because it seems to work just fine with a static token, but perhaps it can cause issues. Would be great if you can test that and see if it is more stable.
Thanks!
Do you also have more information about when there is NO successful run and Discovered the following services: isn't displayed? I'd love to see the network packages of that.
Sure! the attached pcap hopefully has what you need. sc6000-mac-tcponly.pcapng.zip
LMK if you need other captures, or if I filtered out too much.
Don't add or delete entries, just change them (number between 0-255). I did notice that in the Go code, the client token is randomized. I didn't implement that, because it seems to work just fine with a static token, but perhaps it can cause issues. Would be great if you can test that and see if it is more stable.
So I tried this for randomizing CLIENT_TOKEN:
export const CLIENT_TOKEN = Uint8Array.from({ length: 16 }, () => Math.random() * 255);
It does work with a random token, but the success or failure still seems as intermittent as with the fixed token.
Thanks! I'll check out that zip file! Just to make sure we're on the same page: I need to know if it also fails right after you changed that token to a new value?
I need to know if it also fails right after you changed that token to a new value?
There doesn't seem to be any correlation. Just to clarify, I have the app generate a new random code every launch. I believe you're correct that it doesn't matter what the token is, it can be random as in go-stagelinq.
Some other troubleshooting notes:
Thanks! I actually tested the Go code locally here as well on 2.1.1., and I have a lot of connection problems with that one as well. Sometimes it connects and fetches the StateMap service and other times it doesn't. And for some reason the NodeJS version is always connecting now, making this bug very hard to debug. I'll report back when I know more!
@honusz can you test this branch?
https://github.com/MarByteBeep/StageLinq/tree/request_issue
I noticed in your WireShark logs that it failed when I sent the RequestServices message (value 0x00000002), before I received that from the device. So now I wait for that message. Locally all seems to run fine still. But I'm curious to see if this fixes the problem at your end.
@honusz can you test this branch?
Yup!! That's resolved it on my Mac! I haven't had a failure in the dozen+ times I've tried it.
Still having zero success on Win10, though go-stagelinq is 100% successful. I've attached a pcap from PC. win10-go_js.zip
Also, if I forgot to say it before, thank you so much for all your efforts on this project! You're doing real yeoman's work here! =)
@honusz Whoops! I made a mistake in my branch... silly me. Can you pull the branch and retry?
@honusz Whoops! I made a mistake in my branch... silly me. Can you pull the branch and retry?
Didn't work unfortunately.
It seems like serviceRequestAllowed won't be true until after the app sends serviceRequest.
The SC6000 is immediately resetting the connection after the service request is sent.
When I breakpoint on const written = await this.connection.write(ctx.getBuffer());
the connection is still open in Wireshark
As soon as I continue to the next breakpoint on assert(written === ctx.tell());
the SC6000 has reset the connection.
Hmmm.. over here the device (PrimeGo) sends a serviceRequest first. And then I reply with the same request and all goes well. So if you set a breakpoint at line 150 in Controller.ts
this.serviceRequestAllowed = true;
it will never hit?
Also try setting a small delay in connect()
here:
this.port = info.port;
// Add small delay here
await sleep(500);
await this.requestAllServicePorts();
You can change that value to some other values, to see if it makes any difference?
Meanwhile I'll try to see why the Go code might be working, but as said: the Go code isn't working over here anymore. It's very instable and often fails to connect, making debugging pretty difficult. Unless you have a spare SC6000 lying around somewhere? :)
Hmmm.. over here the device (PrimeGo) sends a serviceRequest first.
Yeah, the really weird thing is that the go-stagelinq code, the SC6000 does the same thing as your PrimeGo; it sends the a message with the servicesRequest as the 4th packet in the sequence. But with the this app (and, even more weirdly, only in win10), the 4th packet has no payload, and the FIN flag set.
So if you set a breakpoint at line 150 in Controller.ts... ...it will never hit?
Correct, I have this.serviceRequestAllowed watched, it never changes. Which isn't surprising, as the sc6000 is never sending the message.
Meanwhile I'll try to see why the Go code might be working, but as said: the Go code isn't working over here anymore. It's very instable and often fails to connect, making debugging pretty difficult.
We got a real stone cold whodunnit here, haha. Go code is rocksolid for me. I'll see if I can debug the Go code a bit, at least I can compare working connection sequences from it, and SoundSwitch, and see if I can figure out what they're doing differently.
On the plus side, at least commit 8f9fe06e733be4a2a8907d8d42ddc4f5144b0f91 has fixed the issue in MacOS!
Thanks for all your feedback and I hope you see something different between the two implementations. Judging from what you've sent me, your WireShark skills are way better than mine. So perhaps you are finding something different in there as well? I really suspect a timing issue here.
Did you try to add that await sleep(500)
as well?
You mentioned that commit 8f9fe06 fixes the issue on the Mac, but that code wasn't correct. Did the later commit 78037c1bac4395faa6cf893066660dd665787900 also still work on the Mac?
Also would it be possible for you to sent me a WireShark log of both the Go run and a failed NodeJS run? And to ensure both use the same token, please add this to listener.go:277
listenerConfig.Token = [...]byte { 82, 253, 252, 7, 33, 130, 101, 79, 22, 63, 95, 15, 154, 98, 29, 114 }
So that method looks like:
// ListenWithConfiguration sets up a StagelinQ listener with the given configuration.
func ListenWithConfiguration(listenerConfig *ListenerConfiguration) (listener *Listener, err error) {
// Use empty configuration if no configuration object was passed
if listenerConfig == nil {
listenerConfig = new(ListenerConfiguration)
}
listenerConfig.Token = [...]byte { 82, 253, 252, 7, 33, 130, 101, 79, 22, 63, 95, 15, 154, 98, 29, 114 }
// Initialize token if none was configured
token := listenerConfig.Token
if bytes.Equal(listenerConfig.Token[:], zeroToken[:]) {
if _, err = rand.Read(token[:]); err != nil {
return
}
}
Then the randomization will not occur and both Go and NodeJS send out the exact same token. Then I'd like to know, what's different...?
ADD: Please include all communication between both devices up until the device starts sending back StateMap results. For instance I also like to see the announcements both devices are making. A WS filter like this would be great:
(ip.src == 192.168.178.24 && ip.dst == 192.168.178.32) || (ip.src == 192.168.178.32 && ip.dst == 192.168.178.24)
Where 192.168.178.24
and 192.168.178.32
are the addresses of the PC and the SC6000.
Ok, I think I figured it out!! findBroadcastIP() in Announce.ts is returning the first interface entry that matches IPv4 && internal == false. On my system, I had a bridged connection or something that was showing up as the first interface on the list, so the app's announcement message was being broadcast on the wrong network, and without a receiving an announcement message, the SC6000 wanted nothing to do with it. I realized this when I was trying to build a filter to give you all the info you needed, and I was wondering why I couldn't get the app's Announcement message to show in the filter. Disabling the rogue bridge solved it, as did just hardcoding the broadcast ip, but this is probably a big cause of similar failures.
Go-stagelinq handles this by making an array of all broadcast ips, and announcing on each one. listener.go
func getAllBroadcastIPs() (retval []net.IP, err error) {
addrs, err := net.InterfaceAddrs()
if err != nil {
return
}
ips := []net.IP{}
addrsLoop:
for _, addr := range addrs {
var ip net.IP
var mask net.IPMask
switch v := addr.(type) {
case *net.IPAddr:
ip = v.IP
mask = v.IP.DefaultMask()
case *net.IPNet:
ip = v.IP
mask = v.Mask
}
if ip == nil {
continue
}
// prevent addresses from being added multiple times (for example zeroconf)
bip := makeBroadcastIP(ip, mask)
for _, alreadyAddedIP := range ips {
if alreadyAddedIP.Equal(bip) {
continue addrsLoop
}
}
ips = append(ips, bip)
}
retval = ips
return
}
And later in the anounce() function:
ips, err := getAllBroadcastIPs()
if err != nil {
return
}
for _, ip := range ips {
addr := makeStagelinqDiscoveryBroadcastAddress(ip)
addrString := addr.String()
println(addrString)
packetConn, err := net.DialUDP("udp", nil, addr)
if err == nil {
_, _ = packetConn.Write(finalBytes)
packetConn.Close()
}
}
A more elegant solution may be to peek at the discovery messages being broadcast on the wire, and selecting the network interface that the other devices are broadcasting on, but either fix is a bit beyond my code-fu (at least, not without making a kludgy mess of your code, haha =D).
LMK if you still need those pcaps, but I'm fairly certain we've found the culprit here.
Ahh yes! Good find! Thanks a lot for this. I remember this code when porting it and I couldn't easily translate the whole thing to its NodeJS equivalent. So I took a few shortcuts, which seem to work fine in my particular situation. I actually started implementing your proposed solution, when I thought: what if there are multiple controllers on different network interfaces? I guess broadcasting to all network interfaces might be a bit overkill, but it is probably easiest :) I just pushed a fix, commit 3fd66d3e9fd14fe82c94ce339872e0e28fd8750c, can you test it?
Having said all this: I have no idea how the Go code will work with multiple controllers with different IPs. Have you checked that as well?
I suspect that the Go code doesn't handle multiple devices well either. I do have an idea on how to implement that, yet have no way of testing it, since I have only a single device.
So once we fixed this requeststate issue here, we could move on to that particular problem.
I just pushed a fix, commit 3fd66d3, can you test it?
Success!! I still had a (now much rarer) issue where it failed if the timing was off a bit. Occasionally Controller.connect() in main() would front-run the announce() udp message being broadcast on the wire, or just with too short of an interval for the player. Also, the timing of Controller.connectToService() may have been to quick, which caused a failure.
Adding await sleep(250);
in the following places in main.ts greatly increased the stability:
await announce();
await sleep(250);
await main();
and
await sleep(250);
await controller.connectToService(StateMap);
These little timing quirks could well be the cause of some of these weird instabilities in both this and go-stagelinq, and explain why the respective programs work on some systems, but not others.
Also, adding this to discover() in Controller.ts prevents trying to connect to an x1800 mixer (and crashing). If we can find the result.source of the x1850 (or a common identifier unique to both), we should change it to that.
if (result === null || result.source === 'testing' || result.software.name === 'OfflineAnalyzer' || result.source === "DN-X1800Prime") {
return;
}
Having said all this: I have no idea how the Go code will work with multiple controllers with different IPs. Have you checked that as well?
So, the Go code is able to discover multiple devices, connect and request their services, but it doesn't connect to multiple StateMap services. The Go code works a bit differently; when the listener discovers and new device, it creates and returns it during the discovery loop in main.go. Main now has an array of devices (foundDevices). djswolf's fork is modified so it can subscribe to multiple StateMap services, which works for me. We could do this as well, but it may mean a bit of restructuring the code. Perhaps a listener module handles discovery of devices, tracks their presence (announce and unannounce messages), and returns an array of available devices? A few ways to skin this cat, I suppose.
@honusz thanks a lot for all this testing and your feedback! I'll go ahead and add those waits and then merge the branch with main. Then I'll create a new branch to work on support for multiple devices. Would you be able to test that for me as well? I just added a new issue (#7) for that.
@honusz also, you mention that it crashes when connecting to the x1800 mixer. Ideally I'd like to fix that too. I'll open a new issue for that (#6)
@honusz can you verify https://github.com/MarByteBeep/StageLinq/pull/5 ? If it works, I'll close this issue and make a few new ones.
@honusz can you verify #5 ? If it works, I'll close this issue and make a few new ones.
Yup! Success on mac and win10!! Thankfully we can close this issue! haha
@honusz thanks a lot for all this testing and your feedback!
It's my pleasure, I'm just happy to help. Thank you for all your work on this. I'm certainly happy to test any new builds with SC600s, and contribute to #7 (such as my limited JS skills allow, ha!) Please don't hesitate to @ me to test anything =)
Awesome and will do! I'll let you know when I start working on those other issues. Thanks again!
Setup is 2x SC6000 (2.1.1 fw) X1800 Mixer Windows 10
I'm still getting the issue as mentioned by @kkirjala as the secondary issue in #1 . The device immediately closes the connection after the app sends the service request and client token in requestAllServicePorts():
Tried with go-stagelinq, and it's working fine, the device sends back the list of services. Working message from go-stagelinq:
Error causing message from StageLinq:
FWIW: It's also still having errors if the app connects to the x1800 mixer first, as in #1